Closed Imtinan1996 closed 4 years ago
UPDATE: So this problem wasnt resolved even after further training upto 400k steps, generated wavs for some input would be nice, for others would be of length 0.
So I went to the source and cleaned up my data, I removed multiple speakers, only took data of one speaker, also some audio files had the person singing songs, those were eliminated as well. Silences were also trimmed, and now after 100k steps of retraining, the NaN issue seems to have resolved. Now I am getting echos at ends of audio files etc, but I guess that is another issue, so im closing this issue for now.
So i have been training the network, along with this I have been evaluating each checkpoint very closely as well, so after every 1000 iterations, when a checkpoint is generated, I serve the model and check the output, upto 11000 steps of training everything was fine, but all of a sudden at 12000 steps my output has started giving NaNs as output, has anyone encountered this problem? Just to be specific, i am replicating this for arabic. Additionally here are my align images
NaNs were observed by doing this
OUTPUT:
UPDATE:
After further testing i observed the following:
Some checkpoints give NaNs, some do not
Some outputs give NaNs, some do not
One output may give NaN on one checkpoint, but might not give NaN on the others
I am currently at 110000 steps, but still don't know why this is happening, hopefully with further training this changes