lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.39k stars 255 forks source link

Something wrong when i use the “soundstream” repo #184

Open wangyuxuan11 opened 1 year ago

wangyuxuan11 commented 1 year ago

Thank you for your excellent code~ When I train the SoundStream on my own data from scratch, I encountered a problem that the training loss became "nan" at around 4k steps, I reduced the initial learning rate by a factor of 10, the "soundstream total loss" degraded from 40 to 13 at 17k steps, but the generated ".flac" file stll contains obvious noise and the voice is not clear at all, just like the picture below! I have no idea but I keep the same parameters as you. Do I need to train for more steps? What is the approximate number of steps to get good audio?Do I need to adjust any parameters?Thanks!

image
Makiyuyuko commented 1 year ago

Hi I'm also training Soundstream and I think we might have met with similar problems. Wondering if you have resolved it? What's your batch_size and audio_sample_rate? Thank you.