lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.32k stars 250 forks source link

Consider adding a loss balancer? #60

Open turian opened 1 year ago

turian commented 1 year ago

Although this is departing a bit from the original AudioLM and SoundStream work, I see that you also like to combine the best ideas from multiple sources.

Perhaps consider adding an EMA-based loss balancer?

https://github.com/facebookresearch/encodec/blob/main/encodec/balancer.py

Then, the generative loss can mix a variety of spectral losses and the waveform loss:

image
lucidrains commented 1 year ago

@turian loss balancer is quite cool! I'll take a look at it next week, thanks!

turian commented 1 year ago

It seems like multiscale spectral losses are here to stay. (If your curious, check out my paper "I'm Sorry For Your Loss" for why I still think we need audio losses that match human perception and lead to better optimization.)

Anyway, there's also been a debate about whether to use l1 or l2 and choice of n_fft. I guess using them all is a good idea. I'm a little surprised they didn't also bother to mix in the logspec, like many works (e.g. hifigan) do.

lucidrains commented 1 year ago

@turian ha! nice title 😂

turian commented 1 year ago

Still interested in this :)

turian commented 1 year ago

@lucidrains Now that Encodec is MIT licensed, I think this would be great to adopt.