Closed biendltb closed 8 months ago
@biendltb hey, thanks for raising this
i believe i did reason through this edge case and concluded it did not need to be addressed somehow (but it has been a while)
let me dedicate a morning to this next week and get back to you on this
tldr: you could be right, but i vaguely recall spending some time on this issue
@biendltb hey, thanks for raising this
i believe i did reason through this edge case and concluded it did not need to be addressed somehow (but it has been a while)
let me dedicate a morning to this next week and get back to you on this
tldr: you could be right, but i vaguely recall spending some time on this issue
yeah, you might be right. EOS is stripped from the target token in the training time. It only appears in the label. Then the embedding does not necessarily include the eos embedding as it will never be used. With 2050 embeddings as in the example, if we only add offset of 1024, we will have two redundant embeddings. So we can either add offset with eos counted or create the embedding without eos. Anyways, it should not affect the result to have the redundant embeddings for eos here.
@biendltb if i had to redo
audiolm-pytorch
, i would go for the RQ-transformer design, successfully applied hereit would save a lot of headache
Thanks for sharing this. I haven't checkout this repo of yours and the RQ-Transformer. Good thing for me to read next.
However, regarding the codebase, I tend to shy away from large, untestable files. I've actually broken down your code into smaller files in a private repo, using PytorchLightning to organize things. I know it's very time-consuming but it saves a lot of headaches as the code grows and gives you some confidence. I'm having a performance gap between my code based on Nano-GPT components and your ones but trying to figure out :exploding_head: (good news, your ones is the better one :D )
@biendltb sounds good, you do your own development style
@biendltb as for my code, you are free to copy / paste, it is all MIT
A minor bug: Overlapping between the EOS id and the first token id of the following quantizer.
Context: CoarseTransformer
For example, with Encodec with 2 coarse quantizers, we have (1024 + 1) * 2 = 2050 embeddings
EOS id is assigned to the codebook_size (eos_id=1024 in our example): https://github.com/lucidrains/audiolm-pytorch/blob/42da76b644eb3e16559382333488fd0fdd719611/audiolm_pytorch/audiolm_pytorch.py#L722
When adding offset to quantizer, we use the codebook size without counting EOS: https://github.com/lucidrains/audiolm-pytorch/blob/42da76b644eb3e16559382333488fd0fdd719611/audiolm_pytorch/audiolm_pytorch.py#L848
So the token id 0 of the second quantizer after adding offset will become 0 + 1024 = 1024. This overlaps with the eos token id.
This fix is to count the eos when adding offset to coarse token ids.
(The problem does not happen in FineTransformer as the eos is not used there. The length of generated fine token ids is fixed with the length of the pre-generated coarse token ids.)