Model inference - performace drop when using unsloth

unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

18.69k stars 1.31k forks source link

Model inference - performace drop when using unsloth #771

Open TomekPro opened 4 months ago

TomekPro commented 4 months ago

Hi, I fine-tuned a model (yam-peleg/Experiment26-7B) using unsloth. Then during inference, model correctness drops when using unsloath FastLanguageModel. I see some modules are replaced. It looks a little bit weird that for Mistral type model LlamaRotaryEmbedding is used. Any idea if this could cause a performance drop?

OLD inference:

Unsloth way

When comparing model files I see the following differences: and this:

danielhanchen commented 4 months ago

What do you mean by "performance drop"? Can you elaborate? Is this finetuned vs base model?

Did you check the finetuned model's inference directly after finetuning, or are you loading it in a new instance?

TomekPro commented 4 months ago

@danielhanchen I have a full pipeline for fine-tuning and evaluation LLM models. I test performance against internal dataset. Then I install unsloth and and modify evaluation pipe so that it uses FastLanguageModel as recommended, then I see performance drop on my dataset "Did you check the finetuned model's inference directly after finetuning" - yes

danielhanchen commented 3 months ago

Hmmm do you know why trust_remote_code is set - is there any custom code?

TomekPro commented 3 months ago

Hi, this left after some tests - for some models it was needed as far as I remember, for now there is no any custom code.