unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.42k stars 1.29k forks source link

NotImplementedError: Make sure that a `_reorder_cache` function is correctly implemented in transformers.models.llama.modeling_llama to enable beam search for <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> #1099

Open kiranpedvak opened 1 month ago

kiranpedvak commented 1 month ago

when i tried add num_beams=3 while inferencing getting this issue

danielhanchen commented 1 month ago

Will investigate!

yelboudouri commented 3 weeks ago

Any updates on this issue??