iver56 / audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
https://iver56.github.io/audiomentations/
MIT License
1.76k stars 183 forks source link

Add parameter for ensuring mp3 compression output length is the same as the input length #314

Open iver56 opened 5 months ago

atamazian commented 4 months ago

Can you elaborate how the code should react if mp3 compression output length is different from original input length?

iver56 commented 4 months ago

I haven't given it a lot of thought, but created this issue because I guess most people would like their compressed audio to have the same length as the input for this transform. I've seen some people write their own workarounds. Sometimes a round trip via mp3 adds a bit of silence, sometimes on the order of 40-50ms, to the start of the audio snippet, and I guess this causes it to become longer. As the "silent prefix" can have a variable length, it is not always trivial to align it. A simple solution would be to use something like AdjustDuration (padding and/or trimming) to make sure the output length is the same as the input length, while a more complicated solution would be to find the ideal alignment and chop away the excess, maybe with https://github.com/nomonosound/fast-align-audio

What do you think?

atamazian commented 4 months ago

I think we should implement AdjustDuration for a start. Later, if necessary, more complicated solution may be added.