lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.32k stars 249 forks source link

Bug in generation when generating with Encodec #236

Closed FrancescoVV closed 9 months ago

FrancescoVV commented 9 months ago

Encodec doesn't support "-1" that is used to mask the tokens after EOS.

In particular, the coarse_token_ids here contain some trailing -1 and thus the line coarse_and_fine_ids = torch.cat((coarse_token_ids, sampled_fine_token_ids), dim = -1)

Will still have some "-1s" that will not be recognised by Encodec

lucidrains commented 9 months ago

ah I see, you've reached the fine transformer stage

are you sampling more than one audio at a time?

lucidrains commented 9 months ago

yeah, I can get a naive solution for this issue this am

lucidrains commented 9 months ago

@FrancescoVV ok, try 1.5.7

FrancescoVV commented 9 months ago

I can try on Friday or the weekend, but unfortunately not before that. I will close the issue myself if it's fixed.

In any case, the issued exists both with batch size 1 or more during generation.

lucidrains commented 9 months ago

@FrancescoVV sounds good, i think it is fixed, but you can do the honors

FrancescoVV commented 9 months ago

The issue is fixed now!

lucidrains commented 9 months ago

noice