lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.39k stars 255 forks source link

Missing softmax after Linear layer #263

Closed biendltb closed 8 months ago

biendltb commented 8 months ago

Hi,

Thanks for the amazing work.

When training with the code, I got large numbers of output values in CoarseTransformer for coarse_logits. According to the transformer architecture in the original paper, it's a softmax after linear. However, the softmax is missing here. This results in large values of the output logits: https://github.com/lucidrains/audiolm-pytorch/blob/1a888d2f462384baf5dc8b4782f39a40f59593b7/audiolm_pytorch/audiolm_pytorch.py#L924

This unnormalized logits will effectively disable the gumbel_sample() since the function adds the normalized noise to the logits. https://github.com/lucidrains/audiolm-pytorch/blob/1a888d2f462384baf5dc8b4782f39a40f59593b7/audiolm_pytorch/audiolm_pytorch.py#L1655

Is the softmax layer missing here?

lucidrains commented 8 months ago

@biendltb no i don't think so.

gumbel noise acts on the raw logits afaik