Hey, looking for some help in training SoundStream.
I'm training SoundStream from version 0.15.8 and my results sounds really bad after 20K steps (attached below). furthermore i noticed few things that i would like to share and hear if that happened to anyone:
The EMA results during training is totally noise, while the model output is really bad but sounds like some speech.
The loss is very noisy and around 5000-8000, from previous results here i saw that the loss was much higher. attaching tensorboard graphs of the loss, would like to hear if those losses looks alright.
I trained the model to more then 100K and the loss exploded (17M++) happened to anyone?
Some technical details on my training LibriTTS (24000 sample rate, train-clean-360), model strides: (3, 4, 5, 8)
batch_size=4, grad_accum_every=8 and data_max_length_seconds=1.
Hey, looking for some help in training SoundStream.
I'm training SoundStream from version 0.15.8 and my results sounds really bad after 20K steps (attached below). furthermore i noticed few things that i would like to share and hear if that happened to anyone:
EMA result: https://user-images.githubusercontent.com/113421133/221514146-271b2c5f-6fb1-4f1d-be40-19637107f691.mp4
Model result: https://user-images.githubusercontent.com/113421133/221514348-4055a652-521f-4621-bc72-bbf60a0ac637.mp4
Some technical details on my training LibriTTS (24000 sample rate, train-clean-360), model strides: (3, 4, 5, 8) batch_size=4, grad_accum_every=8 and data_max_length_seconds=1.