Open rootally opened 12 months ago
I'm using the config below and I load the base model as torch.float16
--model_name_or_path llama_model --data_path data.json --bf16 True --num_train_epochs $3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --report_to tensorboard
@gjmulder thanks for getting back.
Without a plot it is difficult to say for certain, but you are probably overfitting. Don't train for more than one epoch.
I'm using the config below and I load the base model as torch.float16
--model_name_or_path llama_model --data_path data.json --bf16 True --num_train_epochs $3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --report_to tensorboard