turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

Load generation_config.json from compatible models #449

Closed npott9 closed 1 month ago

npott9 commented 1 month ago

This update provides a convenient way to grab generator_config properties from ExLlamaV2Config, specifically the eos_token_id attribute.

Llama-3-instruct models provide generator_config.json which contains eos_token_id as a list of token IDs accounting for <|end_of_text|> and <|eot_id|> as stop tokens. The tokenizer.eos_token_id typically contains just the end of sequence token and not end of chat token. Llama-3 generator_config.json started incorporating both token ids into its config to make adding stop tokens easier.

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/generation_config.json

turboderp commented 1 month ago

Seems good, thx