huggingface / swift-transformers

Swift Package to implement a transformers-like API in Swift
Apache License 2.0
536 stars 46 forks source link

Is GenerationConfig.repetitionPenalty used during generation? #84

Closed joneavila closed 2 months ago

joneavila commented 3 months ago

I am testing the code using the Core ML version of Llama 2.

Setting GenerationConfig.maxLength to something larger than the default, e.g., 64, produces the correct number of output tokens, but tends to repeat tokens towards the end of generation. Adjusting repetitionPenalty doesn't seem to have an effect.

Looking into Generation.swift, I see the code references maxLength, eosTokenId, temperature and others, but not repetitionPenalty. Does this explain the repetitive output?