lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.39k stars 255 forks source link

skip the eos when adding offset to avoid overlapping #266

Closed biendltb closed 8 months ago

biendltb commented 8 months ago

A minor bug: Overlapping between the EOS id and the first token id of the following quantizer.

Context: CoarseTransformer

For example, with Encodec with 2 coarse quantizers, we have (1024 + 1) * 2 = 2050 embeddings

So the token id 0 of the second quantizer after adding offset will become 0 + 1024 = 1024. This overlaps with the eos token id.

This fix is to count the eos when adding offset to coarse token ids.

(The problem does not happen in FineTransformer as the eos is not used there. The length of generated fine token ids is fixed with the length of the pre-generated coarse token ids.)

lucidrains commented 8 months ago

@biendltb hey, thanks for raising this

i believe i did reason through this edge case and concluded it did not need to be addressed somehow (but it has been a while)

let me dedicate a morning to this next week and get back to you on this

tldr: you could be right, but i vaguely recall spending some time on this issue

biendltb commented 8 months ago

@biendltb hey, thanks for raising this

i believe i did reason through this edge case and concluded it did not need to be addressed somehow (but it has been a while)

let me dedicate a morning to this next week and get back to you on this

tldr: you could be right, but i vaguely recall spending some time on this issue

yeah, you might be right. EOS is stripped from the target token in the training time. It only appears in the label. Then the embedding does not necessarily include the eos embedding as it will never be used. With 2050 embeddings as in the example, if we only add offset of 1024, we will have two redundant embeddings. So we can either add offset with eos counted or create the embedding without eos. Anyways, it should not affect the result to have the redundant embeddings for eos here.

biendltb commented 8 months ago

@biendltb if i had to redo audiolm-pytorch, i would go for the RQ-transformer design, successfully applied here

it would save a lot of headache

Thanks for sharing this. I haven't checkout this repo of yours and the RQ-Transformer. Good thing for me to read next.

However, regarding the codebase, I tend to shy away from large, untestable files. I've actually broken down your code into smaller files in a private repo, using PytorchLightning to organize things. I know it's very time-consuming but it saves a lot of headaches as the code grows and gives you some confidence. I'm having a performance gap between my code based on Nano-GPT components and your ones but trying to figure out :exploding_head: (good news, your ones is the better one :D )

lucidrains commented 8 months ago

@biendltb sounds good, you do your own development style

lucidrains commented 8 months ago

@biendltb as for my code, you are free to copy / paste, it is all MIT