Loss exploded??? - Githubissues

rishikksh20 / vae_tacotron2

VAE Tacotron 2, an alternative of GST Tacotron

MIT License

85 stars 29 forks source link

Loss exploded??? #2

Open WhiteFu opened 5 years ago

WhiteFu commented 5 years ago

I get the error "loss explode" in the training stage! I'm not modifying the original hyperparameters, and I want to know how to solve the problem.

adimukewar commented 5 years ago

I am facing the same issue. Please let me know if you have resolved it.

rishikksh20 commented 5 years ago

actually, it is a common occurrence when dealing with a variational autoencoder. Two way to resolve it 1) again start training from 3 or 4 back saved checkpoints (not from recent one). But be prepared loss may explode again after running for a while, then do the same process again. 2) In the file train.py on line 133, change the value of the loss. @adimukewar @WhiteFu

WhiteFu commented 5 years ago

Thanks for your reply, I will take it immediately!

WhiteFu commented 5 years ago

I am facing the same issue. Please let me know if you have resolved it.

sorry, I didn't reply to you in time. I have been trying some other work recently, so I haven't solved this problem

rishikksh20 commented 5 years ago

@WhiteFu if you are using this code then use large (more than 50 hours) expressive dataset like a blizzard for getting a decent result.

MisakaMikoto96 commented 5 years ago

I am facing the same issue. Please let me know if you have resolved it.

sorry, I didn't reply to you in time. I have been trying some other work recently, so I haven't solved this problem

hi, I have the same problem that I supposed to modified some hparams but it still not work.Please let me know if you have solved this. thx😄

WhiteFu commented 5 years ago

The loss is not stable, so you can modify the upper limit of the parameter In the file train.py on line 133,

MisakaMikoto96 commented 5 years ago

The loss is not stable, so you can modify the upper limit of the parameter In the file train.py on line 133,

hi, but it seems my loss = nan (every time at the same step when training) and I try to modify the batch size or learning rate but it still not work.

rishikksh20 commented 5 years ago

@MisakaMikoto96 aware of Nan loss, it means your variational autoencoder (VAE) unable to learn the latent representation. This is the common problem when you dealing with Variational autoencoder but the sad part is, there is no simple solution for that. One solution you can try to go line and manipulate w1 and w2. But before that make sure you have adequate quantity and expressiveness rich voice data and also sometimes after getting error, restart training from 2 steps back saved checkpoint is worked fine for me, if you getting the error again at the same checkpoint then restart from 3 steps back saved checkpoint and so on. If again and again, you get Nan at the same step count then try the above solution. You can also read variational autoencoder paper for more understanding and otherwise feel free to ask here.