unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.4k stars 1.29k forks source link

Throw error when inferencing longer than max_popsition_embeddings #1236

Closed Datta0 closed 2 weeks ago

Datta0 commented 2 weeks ago

We have RoPE scaling but that only comes into effect when max_seq_length > config.max_position_embeddings. So if someone created a model with FastLanguageModel.from_pretrained(max_seq_len=2048), if the max_position_embeddings is say 32768, inference works with RoPE till 32768 but Long Rope is not used. If someone initialises with FastLanguageModel.from_pretrained(max_seq_len=131072), Long RoPE kicks in at init and inference would work till max(max_seq_len, max_position_embeddings) and then throw the error:

Error for input shape torch.Size([1, 25600]) Unsloth: input length 25600 + max_new_tokens 2 exceeds `max_position_embeddings` 16384! consider passing max_position_embeddings flag while initializing the model.