Throw error when inferencing longer than max_popsition_embeddings

We have RoPE scaling but that only comes into effect when max_seq_length > config.max_position_embeddings. So if someone created a model with FastLanguageModel.from_pretrained(max_seq_len=2048), if the max_position_embeddings is say 32768, inference works with RoPE till 32768 but Long Rope is not used. If someone initialises with FastLanguageModel.from_pretrained(max_seq_len=131072), Long RoPE kicks in at init and inference would work till max(max_seq_len, max_position_embeddings) and then throw the error:

Error for input shape torch.Size([1, 25600]) Unsloth: input length 25600 + max_new_tokens 2 exceeds `max_position_embeddings` 16384! consider passing max_position_embeddings flag while initializing the model.

unslothai / unsloth

Throw error when inferencing longer than max_popsition_embeddings #1236