Open s1ghhh opened 1 year ago
Same issue here.Have you solved it?
Same issue here.Have you solved it?
I did not solve this problem. Recently, I have some new discoveries: when I use the default code of FastChat, the loss curve is perfect. When I modify the data processing function, the loss curve descends in steps. I checked all parameters and it's the same in both cases. I think it's the data that causes the step down to happen. It is worth mentioning that the models obtained in both cases work very well. But I'm still curious what caused the step down. Have you solved this problem yet? Maybe we can discuss it.
@FHL1998
Hi, Does anyone notice that the eval loss diverge? I had many runs and most of them diverges. In some cases, the overfitted checkpoint produces better response (i.e. dulcet-shape-11 below, epoch 10 performs better than best epoch for some response, and is actually the best model out of all runs).
I used the following setting to train my own dataset with lora, but I found that the loss curve exhibits a stair-step pattern of descent. It appears that the loss undergoes a significant drop at the end/start of each epoch. Furthermore, this phenomenon seems to be common, as I encountered the same issue when using the 7b model as the base model. Is this a problem with my parameter settings? Where should I start investigating this issue?
Here are my parameter settings:
Many thanks!