lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.32k stars 249 forks source link

Inconsistent samples for multiple targets in SoundDataset #224

Closed ilya16 closed 11 months ago

ilya16 commented 11 months ago

When audio lengths are greater than max_length and multiple target sample rates are used, the SoundDataset samples audios with different start positions: https://github.com/lucidrains/audiolm-pytorch/blob/c65bb97662a1ef29ec6359d25bf4022c2cb82a27/audiolm_pytorch/data.py#L86-L97

Affects the training data for CoarseTransformer.

lucidrains commented 11 months ago

@ilya16 yes indeed that does not seem right :disappointed:

decided to take the strategy of doing all the resampling + curtail / pad on the highest target sample freq first, before resampling to all the rest of the target sample freqs

want to see if that addresses the issue?

ilya16 commented 11 months ago

@lucidrains looks good!