Open rapsar opened 10 months ago
This is what exactly I'm looking for! I can't find any straightforward document or tutorial!
Once we have our fine-tuned weights, we can build our fine-tuned model and save it to a new directory, with its associated tokenizer. By performing these steps, we can have a memory-efficient fine-tuned model and tokenizer ready for inference!
https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/
@rapsar @hossainiir Is this tutorial helpful?
The adapter_model.safetensors
file is a key part of the PEFT (Parameter-Efficient Fine-Tuning) framework, specifically related to methods like LoRA (Low-Rank Adaptation) or other adapter-based fine-tuning techniques. This file contains the weights of the adapter layers that have been fine-tuned, separate from the original model weights.
Adapters: Adapters are small neural network layers added to the original model. During fine-tuning, instead of updating all the weights of the large model, only the weights of these adapter layers are updated. This significantly reduces the number of parameters that need to be trained, making fine-tuning more efficient.
Integration with the Base Model:
When you fine-tune a model using adapters, the adapter_model.safetensors
file stores the updated weights of these adapter layers. The base model remains unchanged, and the adapter weights are loaded and integrated with the base model during inference.
adapter_model.safetensors
To use the adapter_model.safetensors
file, you need to load it along with the base model. Here is an example of how to do this:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
MODEL_NAME = 'meta-llama/Meta-Llama-3-8B'
ADAPTER_PATH = 'path/to/adapter_model.safetensors'
# Load base model
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, trust_remote_code=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
# Load adapter configuration and model
adapter_config = PeftConfig.from_pretrained(ADAPTER_PATH)
model = PeftModel.from_pretrained(model, adapter_config, adapter_path=ADAPTER_PATH)
# Now you can use the model with the fine-tuned adapter
input_ids = tokenizer("Your input text here", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Question
I finally managed to fine-tune LLaVA on a custom dataset (LLaVA-1.5-7b on Google Colab using a single A100 GPU) The output I got was mostly an adapter_model.safetensors file (610MB) -- and a bunch of other log (?) files
What should I do with the safetensors file? Incorporate it into the base model -- how?
Thanks!