intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Apache License 2.0
2.06k stars 201 forks source link

Llava Models #852

Open choochtech opened 7 months ago

choochtech commented 7 months ago

Does the Llava part work ?

https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/transformers/modeling/llava_models

If so are they optimized for Intel Device and are there any examples ?

Thanks for building this library. I have seen the token generation performance to be very good compared to OpenVino.

Great work !

Thanks

a32543254 commented 7 months ago

Thanks for your usage!!! Unfortunately, we are not support LLaVA for now. But we may consider to support it in future, please stay tune.

Regrads, Bo

kevinintel commented 7 months ago

It's for multi-model training, but optimization is WIP.

choochtech commented 7 months ago

@kevinintel how do you do the optimization for the llava model and use it ?

kevinintel commented 7 months ago

Someone tried low-bits for llava: https://arxiv.org/pdf/2306.00978.pdf and we will try to quantize it.

choochtech commented 7 months ago

Thanks @kevinintel

WeiweiZhang1 commented 4 months ago

Hi, support for quantization of multimodal models is currently planned, and any updates will be communicated here.

kevinintel commented 1 month ago

we can optimize llava in https://github.com/intel/neural-compressor/pull/1797 will add examples