Is GenerationConfig.repetitionPenalty used during generation?

I am testing the code using the Core ML version of Llama 2.

Setting GenerationConfig.maxLength to something larger than the default, e.g., 64, produces the correct number of output tokens, but tends to repeat tokens towards the end of generation. Adjusting repetitionPenalty doesn't seem to have an effect.

Looking into Generation.swift, I see the code references maxLength, eosTokenId, temperature and others, but not repetitionPenalty. Does this explain the repetitive output?

huggingface / swift-transformers

Is GenerationConfig.repetitionPenalty used during generation? #84