unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.61k stars 1.05k forks source link

Merging to 16bit for vLLM produces lower performance #871

Open vjagannath786 opened 1 month ago

vjagannath786 commented 1 month ago

I have finetuned the model, now trying to inference the results with vLLM. But, the results are so bad. Any idea why is that.

danielhanchen commented 1 month ago

I'm working on a new method which will make this better!