Open jonberliner opened 2 weeks ago
I am also getting this error, hope a fix comes soon.
Apologies on the late reply sorry! My bro and I relocated to SF, so just got back to Github issues!
On vLLM, you mustn't use the bnb-4bit
variants - you must use model.save_pretrained_merged
and save it to 16bit for inference - ie only full 16bit models work for vLLM
Apologies on the late reply sorry! My bro and I relocated to SF, so just got back to Github issues!
On vLLM, you mustn't use the
bnb-4bit
variants - you must usemodel.save_pretrained_merged
and save it to 16bit for inference - ie only full 16bit models work for vLLM
Since vLLM v0.5.0 has been released, vLLM does support bnb quantization. Would it be possible for models finetuned and quantized with unsloth to be served with vLLM given the new release?
@nole69 Are you referring to https://github.com/vllm-project/vllm/pull/4776? I think that's only QLoRA adapters, and not full bnb models. You can try exporting the LoRA adapters, then use vLLM I guess
@danielhanchen Indeed, I think full bnb models will be supported after https://github.com/vllm-project/vllm/pull/5753 is merged
When I try to load an unsloth 4bit model with
llm = LLM("unsloth/mistral-7b-instruct-v0.3-bnb-4bit", dtype="half")
, I get the errorCannot find any of ['adapter_name_or_path'] in the model's quantization config.
This Is true for all llama3 and gemma models as well. As far as I know, there are no lora adapters attached to the models. Please let me know how to proceed in loading them.