Closed wolfassi123 closed 5 months ago
the model underperformed heavily
Could you be more precise? Do you mean that the results were random or the same as just the base model without fine-tuning or somewhere in-between?
One thing you could test would be to merge the LoRA weights into the base model before saving it:
model_merged = model.merge_and_unload()
model_merged.save_pretrained(...)
This should give you the full model, including the config.json
. When you load this model into TGI, do you get results on par with your expectation?
Yep that was it. When using model.merge_and_unload()
I manage to merge the LoRA Adapter with the base model and was able to use ping the model using pipeline correctly. The model performance was the same as when I trained with the LoRA adapter the first time. Apparently earlier it was only loading the adapter and nothing which explains the performance drop.
Great, thanks for testing this. I'll close the issue then, feel free to re-open if something else comes up.
System Info
Python 3.10.12 peft @ git+https://github.com/huggingface/peft.git@25dec602f306d52b6cc078ec8353ba6eac249097 transformers @ git+https://github.com/huggingface/transformers.git@8a0ed0a9a2ee8712b2e2c3b20da2887ef7c55fe6 accelerate==0.27.2
Who can help?
No response
Information
Tasks
examples
folderReproduction
Expected behavior
I had just trained my first LoRA model but I believe that I might have missed something. After training a Flan-T5-Large model, I tested it and it was working perfectly when I decoded the output I got using the following bit of code:
I decided that I wanted to test its deployment using TGI. I managed to deploy the base Flan-T5-Large model from Google using TGI as it was pretty straightforward. But When I came to test the LoRA model I got while using pipeline, the model underperformed heavily. I simply load the checkpoint from using pipeline and set the task to "_text2textgeneration" I noticed that when I trained my LoRA model, I did not get a “config.json” file, I got an “adapter_config.json” file. I understood that what I basically had was only the adapter. I don’t know if that is one of the reason, as after training I did more research concerning LoRA and I noticed that in the documention they had mentioned “merging” and “loading” between the base model and the LoRA, which I did not do at the start. I basically trained and got several checkpoints for each epoch. Tested the checkpoint that had the best metrics and pushed it to my private hub. These are the files that I have pushed to my hub:
While trying to avoid re-training, how can I deploy the LoRA model to test properly using Pipeline so that I can also deploy it on TGI? Ultimately I want to be able to ping the model I got from the LoRA adapter using pipeline, so that I can ultimately deploy it using TGI.
N.B.: The code I used for the "_load_adapted_hf_generationpipeline" function was inspired from the following github post: https://gist.github.com/ahoho/ba41c42984faf64bf4302b2b1cd7e0ce