vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.71k stars 4.66k forks source link

[Feature]: LoRA support for Pixtral #8802

Open spring-anth opened 2 months ago

spring-anth commented 2 months ago

🚀 The feature, motivation and pitch

I have finetuned the linear layers of Pixtral on my own dataset and would like to host the LoRA adapters as is possible for Mistral. It would great if this would be supported in the future.

Related issue: #8685 as the base model I used for finetuning is the HF version

Alternatives

No response

Additional context

No response

Before submitting a new issue...

DarkLight1337 commented 2 months ago

LoRA support for VLMs in general is still WIP. cc @jeejeelee

jeejeelee commented 1 month ago

LoRA support for VLMs in general is still WIP. cc @jeejeelee

Thanks for the ping, I should be able to complete the temporary solution for VL support of LoRA this week.

jeejeelee commented 1 month ago

@spring-anth I have completed the integration of Pixtral support for LoRA, see: https://github.com/jeejeelee/vllm/tree/pixtral-support-lora. Could you please verify this locally? I don't have the enough resources to train the LoRA model myself.

spring-anth commented 1 month ago

@jeejeelee Thank you! I checked out your branch and set it as my current vllm implementation via git clone https://github.com/jeejeelee/vllm.git cd vllm python python_only_dev.py unfortunately I get this ValueError: `rank0: self.model = get_model(model_config=self.model_config,

rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model rank0: return loader.load_model(model_config=model_config,

rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 399, in load_model rank0: model = _initialize_model(model_config, self.load_config,

rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 176, in _initialize_model rank0: return build_model(

rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 157, in build_model rank0: extra_kwargs = _get_model_initialization_kwargs(model_class, lora_config,

rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 134, in _get_model_initialization_kwargs rank0: raise ValueError( rank0: ValueError: Model PixtralForConditionalGeneration does not support LoRA, but LoRA is enabled. Support for this model may be added in the future. If this is important to you, please open an issue on github.`

jeejeelee commented 1 month ago

@spring-anth Hi, which branch are you using? Is it pixtral-support-lora?

spring-anth commented 1 month ago

@jeejeelee You were right, I was on the wrong branch, silly mistake. Unfortunately I currently can't test for you if your change works as I trained pixtral with the transformers-compatible version. Therefore I can only use the LoRA weights for Pixtral once the transformers version of Pixtral is supported in vLLM (which is work in progress). My current workaround is merging the weights and transforming the model back to the vLLM compatible version.

jeejeelee commented 1 month ago

@spring-anth Currently, vLLM only supports LoRA trained with PEFT

spring-anth commented 1 month ago

@jeejeelee Yes, that's not what I meant I did train with PEFT but the training is based on the HF Transformers version of Pixtral (https://huggingface.co/mistral-community/pixtral-12b) which uses a different structure than the vLLM supported version (https://huggingface.co/mistralai/Pixtral-12B-2409)

tensimixt commented 1 month ago

@jeejeelee does this work with https://github.com/vllm-project/vllm/pull/5036? If so can test this week if inference with mistralai/Pixtral-12b-2409 with Lora Adapter works. Or should i use another PR to test vllm inference of this?

Or I can just use this branch https://github.com/jeejeelee/vllm/tree/pixtral-support-lora

Will this work for python -m vllm.entrypoints.openai.api_server where model is set to pixtral and lora modules to pixtral lora adapter?

Thank you!

jeejeelee commented 1 month ago

@jeejeelee does this work with #5036? If so can test this week if inference with mistralai/Pixtral-12b-2409 with Lora Adapter works. Or should i use another PR to test vllm inference of this?

Or I can just use this branch https://github.com/jeejeelee/vllm/tree/pixtral-support-lora

Will this work for python -m vllm.entrypoints.openai.api_server where model is set to pixtral and lora modules to pixtral lora adapter?

Thank you!

I thinki it should work, see: https://docs.vllm.ai/en/latest/models/lora.html#serving-lora-adapters

tensimixt commented 1 month ago

@jeejeelee when doing git checkout pixtral-support-lora and then pip install -e . does it build correctly for you or it crashes or does it very long time to build for you? when doing git checkout pr-5036 build vllm takes only 10-15 minutes, but for new vllm build take very long time been 45 minutes still buidling vllm is there way to make it faster build, thank you!

jeejeelee commented 1 month ago

I also need to spend a long time, unless compiling on high-performance CPU servers.

tensimixt commented 2 weeks ago

@jeejeelee do you think pixtral-support-lora will work with this model https://huggingface.co/mistral-community/pixtral-12b where LoRA is loaded once model loaded with vllm serve. Or it must have LoRA trained on consolidated.safetensors from mistralai?

Thank you!

tensimixt commented 2 weeks ago

@jeejeelee You were right, I was on the wrong branch, silly mistake. Unfortunately I currently can't test for you if your change works as I trained pixtral with the transformers-compatible version. Therefore I can only use the LoRA weights for Pixtral once the transformers version of Pixtral is supported in vLLM (which is work in progress). My current workaround is merging the weights and transforming the model back to the vLLM compatible version.

Did you try to attempt loading mistralai-community pixtral with pixtral-support-lora branch and then load LoRA with vllm serve?

Thank you!

jeejeelee commented 2 weeks ago

@tensimixt Not yet. I will try it asps

tensimixt commented 1 week ago

@tensimixt Not yet. I will try it asps

Thank you, this will help greatly!!

jeejeelee commented 1 week ago

loading mistralai-community

@tensimixt could you please provide the related link?

tensimixt commented 1 week ago

loading mistralai-community

@tensimixt could you please provide the related link?

It is this one https://huggingface.co/mistral-community/pixtral-12b

Thank you!! You are amazing!

jeejeelee commented 1 week ago

@tensimixt I have updated the pixtral-support-lora branch, and I can now run the following script successfully:

vllm serve mistralai/Pixtral-12B-2409 --tokenizer-mode mistral --limit-mm-per-prompt 'image=4' --max-model-len 16384  --enable-lora

I am downloading this model from mistralai-community

jeejeelee commented 1 week ago

I can't start the vllm serve when using https://huggingface.co/mistral-community/pixtral-12b, it raise the following error:

 `ERROR 11-12 06:47:47 engine.py:366] OSError: Found 0 files matching the pattern: re.compile('^tokenizer\\.model\\.v.*$|^tekken\\.json$'). Make sure that a Mistral tokenizer is present in ['model-00002-of-00006.safetensors', 'model-00005-of-00006.safetensors', 'preprocessor_config.json', 'model-00003-of-00006.safetensors', 'chat_template.json', 'tokenizer.json', 'generation_config.json', '.cache', 'model-00004-of-00006.safetensors', 'tokenizer_config.json', 'model.safetensors.index.json', '.gitattributes', 'config.json', 'model-00006-of-00006.safetensors', 'model-00001-of-00006.safetensors', 'README.md', 'special_tokens_map.json', 'processor_config.json'].`
tensimixt commented 1 week ago

I can't start the vllm serve when using https://huggingface.co/mistral-community/pixtral-12b, it raise the following error:

 `ERROR 11-12 06:47:47 engine.py:366] OSError: Found 0 files matching the pattern: re.compile('^tokenizer\\.model\\.v.*$|^tekken\\.json$'). Make sure that a Mistral tokenizer is present in ['model-00002-of-00006.safetensors', 'model-00005-of-00006.safetensors', 'preprocessor_config.json', 'model-00003-of-00006.safetensors', 'chat_template.json', 'tokenizer.json', 'generation_config.json', '.cache', 'model-00004-of-00006.safetensors', 'tokenizer_config.json', 'model.safetensors.index.json', '.gitattributes', 'config.json', 'model-00006-of-00006.safetensors', 'model-00001-of-00006.safetensors', 'README.md', 'special_tokens_map.json', 'processor_config.json'].`

Thank you for updating the branch and trying to serve https://huggingface.co/mistral-community/pixtral-12b using it. I tihnk it is trying to find tekken.json (but instead there is tokenizer.json in the repo)

The current issue is that there is no way to create LoRA adapters for https://huggingface.co/mistralai/Pixtral-12B-2409 using transformers + peft approach (unless you know a way to make it work? )

And so the alternative is to create LoRA adapters using https://huggingface.co/mistral-community/pixtral-12b which is compatible with transformers.

Do you know if there is a way to create LoRA adapter for https://huggingface.co/mistralai/Pixtral-12B-2409 ?

Thank you!

jeejeelee commented 1 week ago

@tensimixt I have no idea, maybe you can create a issue at https://github.com/mistralai/mistral-finetune