lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.39k stars 255 forks source link

`MultiScaleDiscriminator` differs from paper #194

Closed haydenshively closed 1 year ago

haydenshively commented 1 year ago

Spent some time comparing this MultiScaleDiscriminator with the SoundStream paper, as well as the official MelGAN implementation (cited in SoundStream). A few small differences:

I don't think any of these are a big deal, but wanted to share for the sake of completeness.

lucidrains commented 1 year ago

hey Hayden, thanks for raising this

why are you comparing the discriminator with the one from MelGAN? Soundstream has no relationship with that paper afaict?

haydenshively commented 1 year ago

SoundStream Section III.D

For the wave-based discriminator, we use the same multiresolution convolutional discriminator proposed in [15] and adopted in [45]

[15] is MelGAN and [45] is SEANet. SEANet refers readers back to MelGAN for discriminator architecture details, so I went with that.

lucidrains commented 1 year ago

@haydenshively i believe you are right

thank you! i've updated it in 1.1.0; do let me know if you see any other discrepancies