We have RoPE scaling but that only comes into effect when max_seq_length > config.max_position_embeddings.
So if someone created a model with FastLanguageModel.from_pretrained(max_seq_len=2048), if the max_position_embeddings is say 32768, inference works with RoPE till 32768 but Long Rope is not used.
If someone initialises with FastLanguageModel.from_pretrained(max_seq_len=131072), Long RoPE kicks in at init and inference would work till max(max_seq_len, max_position_embeddings) and then throw the error:
Error for input shape torch.Size([1, 25600]) Unsloth: input length 25600 + max_new_tokens 2 exceeds `max_position_embeddings` 16384! consider passing max_position_embeddings flag while initializing the model.
We have RoPE scaling but that only comes into effect when
max_seq_length > config.max_position_embeddings
. So if someone created a model withFastLanguageModel.from_pretrained(max_seq_len=2048)
, if the max_position_embeddings is say 32768, inference works with RoPE till 32768 but Long Rope is not used. If someone initialises withFastLanguageModel.from_pretrained(max_seq_len=131072)
, Long RoPE kicks in at init and inference would work tillmax(max_seq_len, max_position_embeddings)
and then throw the error: