Open zhikaili opened 3 years ago
me,too!I want to know the reason and whether it is harmful to downstream tasks.
Hi,
I find that as the training goes (beyond 20 epochs), the loss will gradually become negative. May I ask if this is harmful to downstream tasks?
Thank you!
I suspect this may be due to the presence of structurally consistent data in the same batch, but I don't have time to verify this at the moment, so if anyone does, please let me know the results, thanks.
Hi,
I find that as the training goes (beyond 20 epochs), the loss will gradually become negative. May I ask if this is harmful to downstream tasks?
Thank you!