huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

Warning about max sequence length #65

Open ChenDRAG opened 7 months ago

ChenDRAG commented 7 months ago

Hi, when I ran the dpo finetuning code, I noticed that there is a warning in the logging output [WARNING|tokenization_utils_base.py:3831] 2023-12-06 16:44:52,195 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2455 > 2048). Running this sequence through the model will result in indexing errors I note that this is related to #32 , who seems to have the same issue.

Do I need to worry about this warning? Could anyone explain why this warning appears?