Closed WoBuChiTang closed 7 months ago
use AdamW with smaller learning rate
see finetuning section in readme
use AdamW with smaller learning rate
see finetuning section in readme
Thank you!
Hi @WoBuChiTang! Have you fixed that error? Did suggestions (reducing the learning rate and using AdamW) from Jason help? If yes, can you please share hyperparameters that you have used?
Hello, I am training Chinese data with a weight of 830m. At the beginning, the loss is normal, but after 2000 steps, there is a situation of "loss is nan" in a large area. May I ask if you have encountered it when training gigaspeech?