unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
12.45k stars 810 forks source link

unsloth 4bit models do not load in vLLM - says missing adapter path or name #688

Open jonberliner opened 2 weeks ago

jonberliner commented 2 weeks ago

When I try to load an unsloth 4bit model with llm = LLM("unsloth/mistral-7b-instruct-v0.3-bnb-4bit", dtype="half"), I get the error Cannot find any of ['adapter_name_or_path'] in the model's quantization config.

This Is true for all llama3 and gemma models as well. As far as I know, there are no lora adapters attached to the models. Please let me know how to proceed in loading them.

hruday-markonda commented 1 week ago

I am also getting this error, hope a fix comes soon.

danielhanchen commented 1 week ago

Apologies on the late reply sorry! My bro and I relocated to SF, so just got back to Github issues!

On vLLM, you mustn't use the bnb-4bit variants - you must use model.save_pretrained_merged and save it to 16bit for inference - ie only full 16bit models work for vLLM

nole69 commented 1 week ago

Apologies on the late reply sorry! My bro and I relocated to SF, so just got back to Github issues!

On vLLM, you mustn't use the bnb-4bit variants - you must use model.save_pretrained_merged and save it to 16bit for inference - ie only full 16bit models work for vLLM

Since vLLM v0.5.0 has been released, vLLM does support bnb quantization. Would it be possible for models finetuned and quantized with unsloth to be served with vLLM given the new release?

danielhanchen commented 1 week ago

@nole69 Are you referring to https://github.com/vllm-project/vllm/pull/4776? I think that's only QLoRA adapters, and not full bnb models. You can try exporting the LoRA adapters, then use vLLM I guess

odulcy-mindee commented 4 days ago

@danielhanchen Indeed, I think full bnb models will be supported after https://github.com/vllm-project/vllm/pull/5753 is merged