jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
699 stars 40 forks source link

Question about training #32

Open handsomelys opened 3 weeks ago

handsomelys commented 3 weeks ago

I'm using audio data from my own realm to do a continuous training on the checkpoint of WavTokenizer-mdium. However, it was found that the model seemed to get worse and worse with the training, and at about 135,000 steps, the loss of the generator suddenly increased (about 50, and the lowest value of the previous steady decline was around 43). The loss of discriminator is relatively stable at 10.7~10.8, and there is a downward trend. Do you know if this situation is normal, and in what ways should I solve this problem? (My training config uses the official github configuration, and I don't adjust other hyperparameters such as learning rate)

jishengpeng commented 3 weeks ago

I'm using audio data from my own realm to do a continuous training on the checkpoint of WavTokenizer-mdium. However, it was found that the model seemed to get worse and worse with the training, and at about 135,000 steps, the loss of the generator suddenly increased (about 50, and the lowest value of the previous steady decline was around 43). The loss of discriminator is relatively stable at 10.7~10.8, and there is a downward trend. Do you know if this situation is normal, and in what ways should I solve this problem? (My training config uses the official github configuration, and I don't adjust other hyperparameters such as learning rate)

This phenomenon is normal; some fluctuations in the total loss may occur during training. Extending the training duration can help address this issue.

handsomelys commented 3 weeks ago

I'm using audio data from my own realm to do a continuous training on the checkpoint of WavTokenizer-mdium. However, it was found that the model seemed to get worse and worse with the training, and at about 135,000 steps, the loss of the generator suddenly increased (about 50, and the lowest value of the previous steady decline was around 43). The loss of discriminator is relatively stable at 10.7~10.8, and there is a downward trend. Do you know if this situation is normal, and in what ways should I solve this problem? (My training config uses the official github configuration, and I don't adjust other hyperparameters such as learning rate)

This phenomenon is normal; some fluctuations in the total loss may occur during training. Extending the training duration can help address this issue.

Even though the loss appears to be decreasing, it can be a time-consuming process to wait for the increased loss to decrease again. Have you encountered this issue during training? How did you address it? Is simply extending the training time the solution?