vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.66k stars 4.65k forks source link

Add Support for QLORA/QA-QLORA weights which are not merged #3225

Open orellavie1212 opened 8 months ago

orellavie1212 commented 8 months ago

currently only original LORA is supported as not fused adapter, I hope to be able to add the support for QLORA/QA-LORA support for the adapters, without fusing with the base model.

chenqianfzh commented 7 months ago

Hi, I am workig on adding QLora support to Vllm.

The first model to support would probably be timdettmers/qlora-alpaca-13b ( and some other qlora models presented by timdettmers in hugging face).

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!