lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.36k stars 255 forks source link

Residual connection after decoder attention? #104

Closed alexdemartos closed 1 year ago

alexdemartos commented 1 year ago

In the decode_from_codebook_indices method there is a residual connection after the decoder attention:

https://github.com/lucidrains/audiolm-pytorch/blob/d244a6c90b33627bad1e38ae2e20e821e6818253/audiolm_pytorch/soundstream.py#L494

Which is not present in the forward method:

https://github.com/lucidrains/audiolm-pytorch/blob/d244a6c90b33627bad1e38ae2e20e821e6818253/audiolm_pytorch/soundstream.py#L567

Is this correct?

In my case, I found significantly higher audio quality during inference after removing this residual connection (matching training conditions).

lucidrains commented 1 year ago

@alexdemartos oh gosh yes :man_facepalming: thank you for catching this

https://github.com/lucidrains/audiolm-pytorch/commit/e8ee51f1efb5bb1bba26ec691191763dbece0ecc

lucidrains commented 1 year ago

@alexdemartos does this mean you have already gotten to the stage of sampling from the coarse and fine transformers?

cyanbx commented 1 year ago

thanks for detecting this bug! It helps me a lot

lucidrains commented 1 year ago

ok, i'm guessing he's slinking back off into the darkness to pen his next paper :laughing:

i'll leave him be