openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.27k stars 370 forks source link

Loss reaches 0 when finetuning 7B model using 1xA100 80G #75

Open rootally opened 12 months ago

rootally commented 12 months ago

I'm using the config below and I load the base model as torch.float16

--model_name_or_path llama_model --data_path data.json --bf16 True --num_train_epochs $3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --report_to tensorboard

gjmulder commented 12 months ago
  1. Are you talking about eval set loss or training loss?
  2. Plot both as a function of epoch similar to #63 to see whether you are overfitting or underfitting
  3. How large is your data set?
  4. How many epochs is $3 set to?
rootally commented 11 months ago

@gjmulder thanks for getting back.

  1. training loss
  2. the loss will actually go to 0 in the second step itself and doesn't recover
  3. the dataset is around 100mb
  4. 3 epochs
gjmulder commented 11 months ago

Without a plot it is difficult to say for certain, but you are probably overfitting. Don't train for more than one epoch.