Open Alack1 opened 6 months ago
Which model? What is the learning rate?
If the learning rate is too high, you may need to reduce it.
I meet the same problem when I use Llama2-based Vicuna model (vicuna-7b-v1.5), and the setting of learning rate is as follows: lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-4 min_lr: 8e-5 warmup_lr: 1e-5
However, when I use Llama1-based Vicunas (v1.1 and v1.3) it runs successfully. Are there any settings in the code that solely works for Llama1 while incompatible with Llama2 ?
Which model? What is the learning rate?
I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2.
Have you successfully resolved the issue?
I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2.
Have you successfully resolved the issue?
Thanks for your reply. I have resolved this problem by changing the padding side of tokenizers to "right"( as specified in Vicuna's config files) and adjusting the other corresponding codes.
I think the effect of padding side is a common problem for tuning LLMs as I met the similar NaN problem when I tried tuning other LLMs (which should use right padding) with left paddings.
Following your README step by step, using the dataset directly from your preprocessed ml-1m file, why does it show the error "Input contains NaN"?