Open turian opened 1 year ago
@turian loss balancer is quite cool! I'll take a look at it next week, thanks!
It seems like multiscale spectral losses are here to stay. (If your curious, check out my paper "I'm Sorry For Your Loss" for why I still think we need audio losses that match human perception and lead to better optimization.)
Anyway, there's also been a debate about whether to use l1 or l2 and choice of n_fft. I guess using them all is a good idea. I'm a little surprised they didn't also bother to mix in the logspec, like many works (e.g. hifigan) do.
@turian ha! nice title 😂
Still interested in this :)
@lucidrains Now that Encodec is MIT licensed, I think this would be great to adopt.
Although this is departing a bit from the original AudioLM and SoundStream work, I see that you also like to combine the best ideas from multiple sources.
Perhaps consider adding an EMA-based loss balancer?
https://github.com/facebookresearch/encodec/blob/main/encodec/balancer.py
Then, the generative loss can mix a variety of spectral losses and the waveform loss: