Open amankumarhal opened 1 week ago
I met the same error,and I guess may be "fsdp" caused it. fsdp "flatten" embed_tokens.weight and didn't get it back when saving model.I saw another issue https://github.com/huggingface/accelerate/issues/2374 on accelerate.unlukily,it didn't say the solution...
how about downgrade Accelerate to 0.30.0?
@Haruka1307 Thank you for your comments. I can confirm that downgrading llamafactory to 0.8.3 and accelerate to 0.30.1 worked!!
Reminder
System Info
Accelerate
version: 0.34.2accelerate
bash location: /home/aman/LLaMA-Factory/env_train/bin/accelerateAccelerate
default config: Not foundReproduction
from transformers import AutoTokenizer, AutoModelForCausalLM import torch
tokenizer = AutoTokenizer.from_pretrained("./saves/llama3.2_domain/pretrain") model = AutoModelForCausalLM.from_pretrained( "./saves/llama3.2_domain/pretrain", device_map="auto")
Expected behavior
Model should load and provide output similar to base llama3.2 models.
More details
I have pretrained a llama3.2 1b and llama3.2 3b models using a domain-specific data. Once the model was trained, I encountered an error upon loading the model for inference "ValueError: Trying to set a tensor of shape torch.Size([197002752]) in "weight" (which has shape torch.Size([128256, 3072])), this look incorrect."
Below is my yaml file which I ran using : CUDA_VISIBLE_DEVICES=0,1 accelerate launch --config_file examples/accelerate/fsdp_config.yaml src/train.py examples/train_full/llama3.2_small_full.yaml:
model
model_name_or_path: meta-llama/Llama-3.2-3B
method
stage: pt do_train: true finetuning_type: full
dataset
dataset: domain cutoff_len: 2048 overwrite_cache: true preprocessing_num_workers: 8
output
output_dir: saves/llama3.2_domain/pretrain logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 1.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 1800 gradient_checkpointing: true
eval
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000
It would be great if someone could help me understand what's wrong with my approach
Others
No response