[Fine-tuning] what to do with adapter_model.safetensors after fine-tuning

rapsar commented 10 months ago

Question

I finally managed to fine-tune LLaVA on a custom dataset (LLaVA-1.5-7b on Google Colab using a single A100 GPU) The output I got was mostly an adapter_model.safetensors file (610MB) -- and a bunch of other log (?) files

What should I do with the safetensors file? Incorporate it into the base model -- how?

Thanks!

hossainiir commented 8 months ago

This is what exactly I'm looking for! I can't find any straightforward document or tutorial!

DavidLee528 commented 7 months ago

Once we have our fine-tuned weights, we can build our fine-tuned model and save it to a new directory, with its associated tokenizer. By performing these steps, we can have a memory-efficient fine-tuned model and tokenizer ready for inference!

https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/

@rapsar @hossainiir Is this tutorial helpful?

Antony-M1 commented 4 months ago

The adapter_model.safetensors file is a key part of the PEFT (Parameter-Efficient Fine-Tuning) framework, specifically related to methods like LoRA (Low-Rank Adaptation) or other adapter-based fine-tuning techniques. This file contains the weights of the adapter layers that have been fine-tuned, separate from the original model weights.

How it Works

Adapters: Adapters are small neural network layers added to the original model. During fine-tuning, instead of updating all the weights of the large model, only the weights of these adapter layers are updated. This significantly reduces the number of parameters that need to be trained, making fine-tuning more efficient.
Integration with the Base Model: When you fine-tune a model using adapters, the adapter_model.safetensors file stores the updated weights of these adapter layers. The base model remains unchanged, and the adapter weights are loaded and integrated with the base model during inference.

Using `adapter_model.safetensors`

To use the adapter_model.safetensors file, you need to load it along with the base model. Here is an example of how to do this:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

MODEL_NAME = 'meta-llama/Meta-Llama-3-8B'
ADAPTER_PATH = 'path/to/adapter_model.safetensors'

# Load base model
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, trust_remote_code=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Load adapter configuration and model
adapter_config = PeftConfig.from_pretrained(ADAPTER_PATH)
model = PeftModel.from_pretrained(model, adapter_config, adapter_path=ADAPTER_PATH)

# Now you can use the model with the fine-tuned adapter
input_ids = tokenizer("Your input text here", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0], skip_special_tokens=True))

haotian-liu / LLaVA