Setting GenerationConfig.maxLength to something larger than the default, e.g., 64, produces the correct number of output tokens, but tends to repeat tokens towards the end of generation. Adjusting repetitionPenalty doesn't seem to have an effect.
Looking into Generation.swift, I see the code references maxLength, eosTokenId, temperature and others, but not repetitionPenalty. Does this explain the repetitive output?
I am testing the code using the Core ML version of Llama 2.
Setting
GenerationConfig.maxLength
to something larger than the default, e.g.,64
, produces the correct number of output tokens, but tends to repeat tokens towards the end of generation. AdjustingrepetitionPenalty
doesn't seem to have an effect.Looking into
Generation.swift
, I see the code referencesmaxLength
,eosTokenId
,temperature
and others, but notrepetitionPenalty
. Does this explain the repetitive output?