iver56 / audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
https://iver56.github.io/audiomentations/
MIT License
1.89k stars 192 forks source link

A transform that speeds up and slows down different segments of the audio #169

Open iver56 opened 2 years ago

iver56 commented 2 years ago

And add a parameter called leave_length_unchanged that when set to True makes sure that the output length equals the input length. In that case, some of the audio will have to be sped up and some of it has to be slowed down. Maybe it can have two modes: time_stretch and speed? Or just make two different classes...

This is based on the spectrogram time stretching idea in the popular SpecAugment paper, but instead here we apply it directly to the waveform. I also saw a github repo recently where they stretched individual phones or words, and it helped improve their metrics.

mmxgn commented 1 year ago

Isn't this already implemented?

On that note, I would recommend checking out using pyrubberband instead of librosa's time_stretch wince the former preserves the transients.

From librosa's documentation of phase_vocoder:

This is a simplified implementation, intended primarily for reference and pedagogical purposes. It makes no attempt to handle transients, and is likely to produce many audible artifacts. For a higher quality implementation, we recommend the RubberBand library [2](https://librosa.org/doc/0.10.0/generated/librosa.phase_vocoder.html#id4) and its Python wrapper pyrubberband.

iver56 commented 1 year ago

This ideas is not implemented in audiomentations yet. The idea in this issue is to add "anchors" and allow the output to have the same length as the input. Different parts of the waveform are sped up or down. This idea is inspired by

I'll try to illustrate: bilde

iver56 commented 1 year ago
* and another repo where they added anchors not randomly but at the start/end of phones or words. I don't remember the name of the repo right now, but I can look for it

I went looking for the repo, but couldn't find it. But I found this paper that mentions an idea like it: https://ieeexplore.ieee.org/document/9003741