Open Iven2132 opened 2 months ago
@Iven2132 Apologies on the delay - you're saving a LoRA adapter, so you need to call vLLM with a LoRA, so it's a bit different
@danielhanchen Can you give me an example?
@Iven2132 Sorry on the delay!! https://docs.vllm.ai/en/latest/models/lora.html should be useful
Try adding the LLM options
quantization=“bitsandbytes”,
load_format=“bitsandbytes”.
Source: https://docs.vllm.ai/en/v0.6.0/quantization/bnb.html
Hi, I'm trying to fine-tune the Llama3.1 8b model but after fine-tuning it uploading it to HF, and when trying to run it using vLLM I get this error "KeyError: 'base_model.model.model.layers.0.mlp.down_proj.lora_A.weight'" Can anyone help me please?
Here is my fine-tuning script:
And here is my vLLM script: