vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Apache License 2.0
743 stars 63 forks source link

Llava model quantization seems not be supported #73

Open caojinpei opened 3 months ago

caojinpei commented 3 months ago

Describe the bug When I use llm-compressor to quantize llava model, but at the begining, it failed. (Unrecognized configuration class: 'transformers.models.llava.configuration_llava.LlavaConfig')

Expected behavior Hope llm-compressor can support LLaVA model.

To Reproduce from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot MODEL_ID = "/home/models/llava-v1.6-vicuna-7b" model = SparseAutoModelForCausalLM.from_pretrained( MODEL_ID, device_map="auto", trust_remote_code=True, )

Errors ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel

Hope to get your reply, thanks.

robertgshaw2-neuralmagic commented 3 months ago

Hey @caojinpei - right now we only support models with XXXForCausalLM, which LLaVA is not.

I have added supporting vision language models and general XXXForConditionalLM to our roadmap. If you have any capacity to contribute a feature, we have happy to give you some pointers to get started! Let me know!

caojinpei commented 3 months ago

Hey @caojinpei - right now we only support models with XXXForCausalLM, which LLaVA is not.

I have added supporting vision language models and general XXXForConditionalLM to our roadmap. If you have any capacity to contribute a feature, we have happy to give you some pointers to get started! Let me know!

Hi, @robertgshaw2-neuralmagic

I am glad to get your reply and thanks for sharing roadmap. Now I want to quantize LLava-v1.6 model whose architecture is LlavaLlamaForCausalLM (Is it XXXForCausalLM?) into W8A16 using GPTQ within llm-compressor. Could you give me some detailed points to do it? Is it very hard to implement it? And I am wonderring, due to llava model includes vision and language model, if we quantize all of them, will the accuracy of llava drop a lot? By the way, if I just want to quantize language model in LLava-v1.6 using llm-compressor, do you have some suggestions?

Looking forward your reply, thanks.

*Model link: https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b

caojinpei commented 3 months ago

Hi @robertgshaw2-neuralmagic By the way, I am wonderring, if llm-compressor support llava-hf/llava-v1.6-vicuna-7b-hf which architectures is LlavaNextForConditionalGeneration? Can you help me check this?

*Model link: https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf

robertgshaw2-neuralmagic commented 2 months ago

@caojinpei apologies for the delay, supporting vision-language models is on our roadmap, but not yet supported. We would definitely welcome a PR or an example though!

markurtz commented 1 month ago

Adding a quick update here, we are actively working on this support now and hope to have some example pathways landing over the next few weeks!