lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.41k stars 256 forks source link

Training SoundStream doesn't result proper audio #112

Closed amitaie closed 1 year ago

amitaie commented 1 year ago

Hey, looking for some help in training SoundStream.

I'm training SoundStream from version 0.15.8 and my results sounds really bad after 20K steps (attached below). furthermore i noticed few things that i would like to share and hear if that happened to anyone:

  1. The EMA results during training is totally noise, while the model output is really bad but sounds like some speech.
  2. The loss is very noisy and around 5000-8000, from previous results here i saw that the loss was much higher. attaching tensorboard graphs of the loss, would like to hear if those losses looks alright.
  3. I trained the model to more then 100K and the loss exploded (17M++) happened to anyone?

EMA result: https://user-images.githubusercontent.com/113421133/221514146-271b2c5f-6fb1-4f1d-be40-19637107f691.mp4

Model result: https://user-images.githubusercontent.com/113421133/221514348-4055a652-521f-4621-bc72-bbf60a0ac637.mp4

image image image image image image

Some technical details on my training LibriTTS (24000 sample rate, train-clean-360), model strides: (3, 4, 5, 8) batch_size=4, grad_accum_every=8 and data_max_length_seconds=1.

lucidrains commented 1 year ago

could you do this in the discussions?