Merging to 16bit for vLLM produces lower performance

unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

15.61k stars 1.05k forks source link

Open vjagannath786 opened 1 month ago

vjagannath786 commented 1 month ago

I have finetuned the model, now trying to inference the results with vLLM. But, the results are so bad. Any idea why is that.

danielhanchen commented 1 month ago

I'm working on a new method which will make this better!