iver56 / audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
https://iver56.github.io/audiomentations/
MIT License
1.76k stars 183 forks source link

Avoid picking silent part of noise file #337

Closed Laubeee closed 1 month ago

Laubeee commented 2 months ago

I have a bunch of noise files that end in silence that I am adding to my inputs. As the inputs can be as short as one second, the randomized parameters often happen to pick a noise file segment that is complete silence and putting a warning that the file is too quiet and the original input is being returned.

I would suggest to avoid that when picking the min and max offset of the noise file here e.g. to trim leading and trailing zeros from the noise file first

iver56 commented 2 months ago

Thanks for sharing!

If I were you, I would "clean" the data, i.e. remove the silent part from the files, and then that would be the fix for the observed issue. If you have good reasons for keeping the digital silence in the end of the files, and don't want to, or cannot edit them, another option, at least in the short term, would be to fork the project and implement the functionality you propose. The part you would need to change is here: https://github.com/iver56/audiomentations/blob/21dbc4372248198149c21abbd1ef5c34e34be968/audiomentations/augmentations/add_background_noise.py#L184 You could use audiomentation's Trim class to remove the silence from the start and end before randomizing the offset.

My hunch is that most people do not have this issue, so I will not work on it right now, but I will leave it open in case anyone else has this issue too and wants to thumb up your post. If it turns out many people have this issue, a trimming option can be added.

Laubeee commented 2 months ago

Yes I agree, cleaning data is usually the best solution. However, in this case some trimmed noise files might no longer be long enough for some of the longer input sequences. I could consider using "short noises" instead and making the delay long enough for it to never occur twice, but for some cases the result will be unnatural without a fade-in if it doesn't start right at the beginning.

Surely a bit of an edge case, but (hopefully) the options might still be useful to others... it's okay for me to proceed as you proposed, thanks

iver56 commented 2 months ago

Good point. AddShortNoises may indeed be better suited for your need. You could set noise_transform to Trim(), and it would trim your noise sound before finding an offset for it and mixing it in. It also supports fade in/out.

Laubeee commented 1 month ago

btw I can confirm that using short noises did the trick for my use case, thanks :) feel free to close the issue depding on whether or not you still consider adding this.