siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.71k stars 105 forks source link

Quanto Support for Fast LoRA loading and switching Functions #1109

Open Metal079 opened 2 months ago

Metal079 commented 2 months ago

🚀 The feature, motivation and pitch

I keep several models in VRAM at once to increase inference speed when switching between models on my website and i'd like to be able to use Quanto https://github.com/huggingface/optimum-quanto to decrease the vram usage but I found that the OneDiffX Fast LoRA loading and switching functions are not compatible with models that are quantizied using quanto, is this possibly something that be looked into supporting?

Alternatives

PEFT lora loading works, but it is much slower, what would take ~8 seconds for PEFT to load, OneDiffX can do it in 2 seconds or less.

Additional context

No response