Training SoundStream doesn't result proper audio

Hey, looking for some help in training SoundStream.

I'm training SoundStream from version 0.15.8 and my results sounds really bad after 20K steps (attached below). furthermore i noticed few things that i would like to share and hear if that happened to anyone:

The EMA results during training is totally noise, while the model output is really bad but sounds like some speech.
The loss is very noisy and around 5000-8000, from previous results here i saw that the loss was much higher. attaching tensorboard graphs of the loss, would like to hear if those losses looks alright.
I trained the model to more then 100K and the loss exploded (17M++) happened to anyone?

EMA result: https://user-images.githubusercontent.com/113421133/221514146-271b2c5f-6fb1-4f1d-be40-19637107f691.mp4

Model result: https://user-images.githubusercontent.com/113421133/221514348-4055a652-521f-4621-bc72-bbf60a0ac637.mp4

Some technical details on my training LibriTTS (24000 sample rate, train-clean-360), model strides: (3, 4, 5, 8) batch_size=4, grad_accum_every=8 and data_max_length_seconds=1.

lucidrains / audiolm-pytorch

Training SoundStream doesn't result proper audio #112