Open spring-anth opened 2 months ago
LoRA support for VLMs in general is still WIP. cc @jeejeelee
LoRA support for VLMs in general is still WIP. cc @jeejeelee
Thanks for the ping, I should be able to complete the temporary solution for VL support of LoRA this week.
@spring-anth I have completed the integration of Pixtral support for LoRA, see: https://github.com/jeejeelee/vllm/tree/pixtral-support-lora. Could you please verify this locally? I don't have the enough resources to train the LoRA model myself.
@jeejeelee Thank you! I checked out your branch and set it as my current vllm implementation via git clone https://github.com/jeejeelee/vllm.git cd vllm python python_only_dev.py
unfortunately I get this ValueError: `rank0: self.model = get_model(model_config=self.model_config,
rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model rank0: return loader.load_model(model_config=model_config,
rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 399, in load_model rank0: model = _initialize_model(model_config, self.load_config,
rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 176, in _initialize_model rank0: return build_model(
rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 157, in build_model rank0: extra_kwargs = _get_model_initialization_kwargs(model_class, lora_config,
rank0: File "/home/docker/.local/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 134, in _get_model_initialization_kwargs rank0: raise ValueError( rank0: ValueError: Model PixtralForConditionalGeneration does not support LoRA, but LoRA is enabled. Support for this model may be added in the future. If this is important to you, please open an issue on github.`
@spring-anth Hi, which branch are you using? Is it pixtral-support-lora
?
@jeejeelee You were right, I was on the wrong branch, silly mistake. Unfortunately I currently can't test for you if your change works as I trained pixtral with the transformers-compatible version. Therefore I can only use the LoRA weights for Pixtral once the transformers version of Pixtral is supported in vLLM (which is work in progress). My current workaround is merging the weights and transforming the model back to the vLLM compatible version.
@spring-anth Currently, vLLM only supports LoRA trained with PEFT
@jeejeelee Yes, that's not what I meant I did train with PEFT but the training is based on the HF Transformers version of Pixtral (https://huggingface.co/mistral-community/pixtral-12b) which uses a different structure than the vLLM supported version (https://huggingface.co/mistralai/Pixtral-12B-2409)
@jeejeelee does this work with https://github.com/vllm-project/vllm/pull/5036? If so can test this week if inference with mistralai/Pixtral-12b-2409 with Lora Adapter works. Or should i use another PR to test vllm inference of this?
Or I can just use this branch https://github.com/jeejeelee/vllm/tree/pixtral-support-lora
Will this work for python -m vllm.entrypoints.openai.api_server where model is set to pixtral and lora modules to pixtral lora adapter?
Thank you!
@jeejeelee does this work with #5036? If so can test this week if inference with mistralai/Pixtral-12b-2409 with Lora Adapter works. Or should i use another PR to test vllm inference of this?
Or I can just use this branch https://github.com/jeejeelee/vllm/tree/pixtral-support-lora
Will this work for python -m vllm.entrypoints.openai.api_server where model is set to pixtral and lora modules to pixtral lora adapter?
Thank you!
I thinki it should work, see: https://docs.vllm.ai/en/latest/models/lora.html#serving-lora-adapters
@jeejeelee when doing git checkout pixtral-support-lora and then pip install -e . does it build correctly for you or it crashes or does it very long time to build for you? when doing git checkout pr-5036 build vllm takes only 10-15 minutes, but for new vllm build take very long time been 45 minutes still buidling vllm is there way to make it faster build, thank you!
I also need to spend a long time, unless compiling on high-performance CPU servers.
@jeejeelee do you think pixtral-support-lora will work with this model https://huggingface.co/mistral-community/pixtral-12b where LoRA is loaded once model loaded with vllm serve. Or it must have LoRA trained on consolidated.safetensors from mistralai?
Thank you!
@jeejeelee You were right, I was on the wrong branch, silly mistake. Unfortunately I currently can't test for you if your change works as I trained pixtral with the transformers-compatible version. Therefore I can only use the LoRA weights for Pixtral once the transformers version of Pixtral is supported in vLLM (which is work in progress). My current workaround is merging the weights and transforming the model back to the vLLM compatible version.
Did you try to attempt loading mistralai-community pixtral with pixtral-support-lora branch and then load LoRA with vllm serve?
Thank you!
@tensimixt Not yet. I will try it asps
@tensimixt Not yet. I will try it asps
Thank you, this will help greatly!!
loading mistralai-community
@tensimixt could you please provide the related link?
loading mistralai-community
@tensimixt could you please provide the related link?
It is this one https://huggingface.co/mistral-community/pixtral-12b
Thank you!! You are amazing!
@tensimixt I have updated the pixtral-support-lora
branch, and I can now run the following script successfully:
vllm serve mistralai/Pixtral-12B-2409 --tokenizer-mode mistral --limit-mm-per-prompt 'image=4' --max-model-len 16384 --enable-lora
I am downloading this model from mistralai-community
I can't start the vllm serve when using https://huggingface.co/mistral-community/pixtral-12b, it raise the following error:
`ERROR 11-12 06:47:47 engine.py:366] OSError: Found 0 files matching the pattern: re.compile('^tokenizer\\.model\\.v.*$|^tekken\\.json$'). Make sure that a Mistral tokenizer is present in ['model-00002-of-00006.safetensors', 'model-00005-of-00006.safetensors', 'preprocessor_config.json', 'model-00003-of-00006.safetensors', 'chat_template.json', 'tokenizer.json', 'generation_config.json', '.cache', 'model-00004-of-00006.safetensors', 'tokenizer_config.json', 'model.safetensors.index.json', '.gitattributes', 'config.json', 'model-00006-of-00006.safetensors', 'model-00001-of-00006.safetensors', 'README.md', 'special_tokens_map.json', 'processor_config.json'].`
I can't start the vllm serve when using https://huggingface.co/mistral-community/pixtral-12b, it raise the following error:
`ERROR 11-12 06:47:47 engine.py:366] OSError: Found 0 files matching the pattern: re.compile('^tokenizer\\.model\\.v.*$|^tekken\\.json$'). Make sure that a Mistral tokenizer is present in ['model-00002-of-00006.safetensors', 'model-00005-of-00006.safetensors', 'preprocessor_config.json', 'model-00003-of-00006.safetensors', 'chat_template.json', 'tokenizer.json', 'generation_config.json', '.cache', 'model-00004-of-00006.safetensors', 'tokenizer_config.json', 'model.safetensors.index.json', '.gitattributes', 'config.json', 'model-00006-of-00006.safetensors', 'model-00001-of-00006.safetensors', 'README.md', 'special_tokens_map.json', 'processor_config.json'].`
Thank you for updating the branch and trying to serve https://huggingface.co/mistral-community/pixtral-12b using it. I tihnk it is trying to find tekken.json (but instead there is tokenizer.json in the repo)
The current issue is that there is no way to create LoRA adapters for https://huggingface.co/mistralai/Pixtral-12B-2409 using transformers + peft approach (unless you know a way to make it work? )
And so the alternative is to create LoRA adapters using https://huggingface.co/mistral-community/pixtral-12b which is compatible with transformers.
Do you know if there is a way to create LoRA adapter for https://huggingface.co/mistralai/Pixtral-12B-2409 ?
Thank you!
@tensimixt I have no idea, maybe you can create a issue at https://github.com/mistralai/mistral-finetune
🚀 The feature, motivation and pitch
I have finetuned the linear layers of Pixtral on my own dataset and would like to host the LoRA adapters as is possible for Mistral. It would great if this would be supported in the future.
Related issue: #8685 as the base model I used for finetuning is the HF version
Alternatives
No response
Additional context
No response
Before submitting a new issue...