lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.39k stars 255 forks source link

Question on discrepancy between original data and reconstructed data sizes #238

Open tysonjordan opened 11 months ago

tysonjordan commented 11 months ago

I'm trying to reconstruct a single tensor using encodec's forward() and decode() methods. I'm getting different tensor lengths for my original tensor (161856) and my reconstructed tensor (161920). Is there some kind of padding that occurs during the encoding process? If so, am I able to simply trim the excess my reconstructed tensor, or should I make some adjustment before encoding?

I apologize if this is trivial!

lucidrains commented 11 months ago

@tysonjordan encodec is processed in frames, so they must be padding it to the next frame length

i think you should be able to safely trim it