Open niranjanakella opened 5 months ago
Hi @niranjanakella!
Candle does not support LoRA adapters. Additionally, neither will candle-lora, since you have adapter_model.bin and adapter_config.json, which probably means you trained with PEFT? Mistral.rs supports LoRA adapters from PEFT though, and you can run them on a GGUF model (we don't have a T5 model yet, but I can add that soon if you are interested). It will merge the weights into the base model at runtime to optimize performance and because training is not expected. Is there a reason why you do not want to use weight merging?
@EricLBuehler Yes exactly!! I have trained using PEFT and I want the adapter to be loaded alongside the GGUF model at runtime. Support for the T5 model is highly required and appreciated and I am planning to build major applications with T5. So please do let me know how soon can we expect the support for T5 in mistral.rs.
@niranjanakella I can add it over the weekend. Can you please open an issue on mistral.rs as this is non-Candle discussion? Thanks!
@niranjanakella, would a T5 GGUF model be the best option?
@EricLBuehler Yes T5 architecture would be best for most downstream encoder-decoder tasks given the fact that flan version of the model is widely used across the industry. Sure I shall open an issue for the support of T5 architecture in mistral.rs. Awesome.
@EricLBuehler I have opened "# 384" in mistral.rs that relates to the integration of T5 architecture type. And BTW, T5 is a Seq2Seq Language model, it doesn't fall under embedding models.
Hello all,
I have recently managed to convert the flan-t5 base model to GGUF #2215 . But I also have multiple LoRA adapters trained for different tasks.
@EricLBuehler @LaurentMazare So I wish to know if there is a way to also load single/multiple LoRA adapters along with the GGUF model. I am currently running an inference using the following command:
But I have the adapter as (adapter_model.bin and adapter_config.json), which I would like load along with this model Without Weight Merging.