Ensure chat models terminate generation with EOS token

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

MIT License

689 stars 78 forks source link

Ensure chat models terminate generation with EOS token #115

Closed lewtun closed 6 months ago

lewtun commented 6 months ago

Closes #109

I'm not sure if there's any reason not to specify the EOS token ID, but I have verified that adding this ensures chat models terminate on the EOS token.

clefourrier commented 6 months ago

Could there edge cases in which the eos token is not defined in the tokenizer?

lewtun commented 6 months ago

Could there edge cases in which the eos token is not defined in the tokenizer?

I'm not aware of any LLM tokenizers that don't have an EOS token, but I think in the worst case we'll have tokenizer.eos_token_id=None which is the previous behaviour in generate()