zyang1580 / CoLLM

The implementation for the work "CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation".
BSD 3-Clause "New" or "Revised" License
47 stars 6 forks source link

Input contains NaN. #6

Open Alack1 opened 6 months ago

Alack1 commented 6 months ago

Following your README step by step, using the dataset directly from your preprocessed ml-1m file, why does it show the error "Input contains NaN"?

zyang1580 commented 5 months ago

Which model? What is the learning rate?

zyang1580 commented 5 months ago

If the learning rate is too high, you may need to reduce it.

XiyuChangSJTU commented 2 months ago

I meet the same problem when I use Llama2-based Vicuna model (vicuna-7b-v1.5), and the setting of learning rate is as follows: lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-4 min_lr: 8e-5 warmup_lr: 1e-5

However, when I use Llama1-based Vicunas (v1.1 and v1.3) it runs successfully. Are there any settings in the code that solely works for Llama1 while incompatible with Llama2 ?

Which model? What is the learning rate?

zyang1580 commented 1 month ago

I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2.

Have you successfully resolved the issue?

XiyuChangSJTU commented 1 month ago

I haven't experimented with Llamma 2 yet, so I'm unsure of the potential reasons. We might need to adjust the codes or settings from Llamma 1 to make them compatible with Llamma 2.

Have you successfully resolved the issue?

Thanks for your reply. I have resolved this problem by changing the padding side of tokenizers to "right"( as specified in Vicuna's config files) and adjusting the other corresponding codes.

I think the effect of padding side is a common problem for tuning LLMs as I met the similar NaN problem when I tried tuning other LLMs (which should use right padding) with left paddings.