How to load LoRA adapter along with the GGUF model?

huggingface / candle

Minimalist ML framework for Rust

Apache License 2.0

15.72k stars 943 forks source link

How to load LoRA adapter along with the GGUF model? #2226

Open niranjanakella opened 5 months ago

niranjanakella commented 5 months ago

Hello all,

I have recently managed to convert the flan-t5 base model to GGUF #2215 . But I also have multiple LoRA adapters trained for different tasks.

@EricLBuehler @LaurentMazare So I wish to know if there is a way to also load single/multiple LoRA adapters along with the GGUF model. I am currently running an inference using the following command:

cargo run --example quantized-t5 --release  -- --weight-file "flant5large_f16.gguf" \
--config-file "flan-t5-large/config.json" \
--prompt "Make this text coherent: Their flight is weak. They run quickly through the tree canopy."

But I have the adapter as (adapter_model.bin and adapter_config.json), which I would like load along with this model Without Weight Merging.

EricLBuehler commented 5 months ago

Hi @niranjanakella!

Candle does not support LoRA adapters. Additionally, neither will candle-lora, since you have adapter_model.bin and adapter_config.json, which probably means you trained with PEFT? Mistral.rs supports LoRA adapters from PEFT though, and you can run them on a GGUF model (we don't have a T5 model yet, but I can add that soon if you are interested). It will merge the weights into the base model at runtime to optimize performance and because training is not expected. Is there a reason why you do not want to use weight merging?

niranjanakella commented 5 months ago

@EricLBuehler Yes exactly!! I have trained using PEFT and I want the adapter to be loaded alongside the GGUF model at runtime. Support for the T5 model is highly required and appreciated and I am planning to build major applications with T5. So please do let me know how soon can we expect the support for T5 in mistral.rs.

EricLBuehler commented 5 months ago

@niranjanakella I can add it over the weekend. Can you please open an issue on mistral.rs as this is non-Candle discussion? Thanks!

EricLBuehler commented 4 months ago

@niranjanakella, would a T5 GGUF model be the best option?

niranjanakella commented 4 months ago

@EricLBuehler Yes T5 architecture would be best for most downstream encoder-decoder tasks given the fact that flan version of the model is widely used across the industry. Sure I shall open an issue for the support of T5 architecture in mistral.rs. Awesome.

niranjanakella commented 4 months ago

@EricLBuehler I have opened "# 384" in mistral.rs that relates to the integration of T5 architecture type. And BTW, T5 is a Seq2Seq Language model, it doesn't fall under embedding models.