Add support for (data-aware) compression of CausalVisualLMs:
LLaVa (e.g., llava-hf/llava-v1.6-mistral-7b-hf)
NanoLLaVa (e.g, qnguyen3/nanoLLaVA)
MiniCPMV (e.g., openbmb/MiniCPM-V-2_6)
When quantization_config is given, language model will be compressed according to it. Other model parts, including vision and text embeddings models are compressed to int8_asym.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
Add support for (data-aware) compression of CausalVisualLMs:
llava-hf/llava-v1.6-mistral-7b-hf
)qnguyen3/nanoLLaVA
)openbmb/MiniCPM-V-2_6
)When
quantization_config
is given, language model will be compressed according to it. Other model parts, including vision and text embeddings models are compressed to int8_asym.Example:
optimum-cli
Python API
Before submitting