Why is tokenizer.model_max_length set to 1000000000000000019884624838656?

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.61k stars 444 forks source link

Closed kevinhu closed 12 months ago

kevinhu commented 1 year ago

ChuXNobody commented 1 year ago

不涉及模型训练参数，跟训练集挂钩的设定，获取max token

jzhang38 commented 12 months ago

It is also present in the llama 2 config: https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/tokenizer_config.json#L22. I guess it is present to be compatible with other HuggingFace tokenizer. You can safely ignore it.