Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
I had to finetune llama3.2 11B Vision Instruct and I downloaded the model from huggingface(https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)> I'm trying to finetune the model on a custom dataset of mine by following the finetuning notebook. When I start finetuning, I run into list conversion to tensor issue which I'm guessing is because the dataset is not in the right format. Could anybody suggest the dataset format?
I have ~4k images, metadata.csv which contains 20 columns encompassing all the information about the images, a prompt for finetuning.
The code I used for generating the dataset :
import os
import pandas as pd
from datasets import Dataset, DatasetDict
from transformers import AutoTokenizer
from PIL import Image
from torchvision import transforms
import torch
image_folder = 'path to images folder'
csv_file = 'path to metadata.csv'
prompt = "The prompt used for FT"
metadata = pd.read_csv(csv_file)
metadata['image_path'] = metadata['file_name'].apply(lambda x: os.path.join(image_folder, x))
def load_image(image_path):
image = Image.open(image_path).convert("RGB")
return image
def preprocess_image(image):
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
return transform(image)
def tokenize_prompt(prompt, tokenizer):
return tokenizer(prompt, return_tensors="pt", padding="max_length", truncation=True, max_length=512)
tokenizer = AutoTokenizer.from_pretrained("path to llama model")
data = []
for idx, row in metadata.iterrows():
image_path = os.path.join(image_folder, row["image_path"])
image = load_image(image_path)
image = preprocess_image(image)
tokenized_prompt = tokenize_prompt(prompt, tokenizer)
data_entry = {
"image": image,
"text": prompt,
"input_ids": tokenized_prompt["input_ids"].squeeze().tolist(),
"attention_mask": tokenized_prompt["attention_mask"].squeeze().tolist(),
"metadata": row.to_dict()
}
data.append(data_entry)
dataset = Dataset.from_pandas(pd.DataFrame(data))
dataset_dict = DatasetDict({
"train": dataset
})
dataset_dict.save_to_disk("train_dataset")
AttributeError: 'list' object has no attribute 'to'"
}
I have tried keeping input_ids and attention_mask as pytorch tensors but there was a problem during conversion of tensors to arrow objects during dataset creation.
Expected behavior
Any guide on how to create a dataset compatible with llama3.2 11B Vision Instruct with images, metadata and a prompt
@amoghskanda You need to convert list into tensor, something like batch["labels"] = torch.tensor(label_list). Please check this example about how to convert the dialogs into tokens
System Info
python 3.10.15 torch 2.5.1 transformers 4.46.2 tokenizers 0.20.3
Information
🐛 Describe the bug
I had to finetune llama3.2 11B Vision Instruct and I downloaded the model from huggingface(https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)> I'm trying to finetune the model on a custom dataset of mine by following the finetuning notebook. When I start finetuning, I run into list conversion to tensor issue which I'm guessing is because the dataset is not in the right format. Could anybody suggest the dataset format? I have ~4k images, metadata.csv which contains 20 columns encompassing all the information about the images, a prompt for finetuning. The code I used for generating the dataset :
Error logs
{ "name": "AttributeError", "message": "'list' object has no attribute 'to'", "stack": "--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[9], line 15 12 scheduler = StepLR(optimizer, step_size=1, gamma=train_config.gamma) 14 # Start the training process ---> 15 results = train( 16 model, 17 train_dataloader['train'], 18 eval_dataloader['test'], 19 tokenizer, 20 optimizer, 21 scheduler, 22 train_config.gradient_accumulation_steps, 23 train_config, 24 None, 25 None, 26 None, 27 wandb_run=None, 28 )
File ~/anaconda3/envs/llama/lib/python3.10/site-packages/llama_recipes/utils/train_utils.py:151, in train(model, train_dataloader, eval_dataloader, tokenizer, optimizer, lr_scheduler, gradient_accumulation_steps, train_config, fsdp_config, local_rank, rank, wandb_run) 149 batch[key] = batch[key].to('xpu:0') 150 elif torch.cuda.is_available(): --> 151 batch[key] = batch[key].to('cuda:0') 152 with autocast(): 153 loss = model(**batch).loss
AttributeError: 'list' object has no attribute 'to'" }
I have tried keeping
input_ids
andattention_mask
as pytorch tensors but there was a problem during conversion of tensors to arrow objects during dataset creation.Expected behavior
Any guide on how to create a dataset compatible with llama3.2 11B Vision Instruct with images, metadata and a prompt