Closed biendltb closed 8 months ago
Hi,
Thanks for the amazing work.
When training with the code, I got large numbers of output values in CoarseTransformer for coarse_logits. According to the transformer architecture in the original paper, it's a softmax after linear. However, the softmax is missing here. This results in large values of the output logits: https://github.com/lucidrains/audiolm-pytorch/blob/1a888d2f462384baf5dc8b4782f39a40f59593b7/audiolm_pytorch/audiolm_pytorch.py#L924
This unnormalized logits will effectively disable the gumbel_sample() since the function adds the normalized noise to the logits. https://github.com/lucidrains/audiolm-pytorch/blob/1a888d2f462384baf5dc8b4782f39a40f59593b7/audiolm_pytorch/audiolm_pytorch.py#L1655
gumbel_sample()
Is the softmax layer missing here?
@biendltb no i don't think so.
gumbel noise acts on the raw logits afaik
Hi,
Thanks for the amazing work.
When training with the code, I got large numbers of output values in CoarseTransformer for coarse_logits. According to the transformer architecture in the original paper, it's a softmax after linear. However, the softmax is missing here. This results in large values of the output logits: https://github.com/lucidrains/audiolm-pytorch/blob/1a888d2f462384baf5dc8b4782f39a40f59593b7/audiolm_pytorch/audiolm_pytorch.py#L924
This unnormalized logits will effectively disable the
gumbel_sample()
since the function adds the normalized noise to the logits. https://github.com/lucidrains/audiolm-pytorch/blob/1a888d2f462384baf5dc8b4782f39a40f59593b7/audiolm_pytorch/audiolm_pytorch.py#L1655Is the softmax layer missing here?