Open lkluo opened 1 year ago
I got tokenization mismatch: 1 vs. 1612 and the loss is always zero too.
I got a similar issue. The loss (in my case it's eval loss) is always nan when using train lora
similar problem here. Using train lora the train loss is fine but eval loss is always NAN
You don't need to change the conversation template. You should probably be finetuning the base model so the chat template will not matter.
You don't need to change the conversation template. You should probably be finetuning the base model so the chat template will not matter.
You are right. Any prompt template should work, but just curious why the llama-2 template doesn't.
Two questions regarding llama 2 fine tuning:
it seems the prompt templates defaults to
vincuna
and cannot overwritten according to the following code: https://github.com/lm-sys/FastChat/blob/cfc73bf3e13c22ded81e89675e0d7b228cf4b342/fastchat/train/train.py#L85when I hard code to
llama-2
, I got training loss being 0 forever, while the loss is normal when switching back tovicuna
. Could be related to 'llama-2' prompt template?The code to reproduce the result:
python3 train_lora.py \ --model_name_or_path meta-llama/Llama-2-7b \ --lora_r 16 \ --lora_alpha 32 \ --lora_dropout 0.05 \ --data_path data/dummy_conversation.json \ --output_dir /llama-2-output \ --num_train_epochs 4 \ --fp16 True \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 8 \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --eval_steps 100\ --save_strategy "steps" \ --save_steps 100\ --save_total_limit 2 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_strategy "steps" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --q_lora False \ --gradient_checkpointing True \ --flash_attn False \ --lazy_preprocess True