jameslyons / python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies.
MIT License
2.37k stars 617 forks source link

Data augmentation using VTLP #58

Open bernardohenz opened 6 years ago

bernardohenz commented 6 years ago

Hi,

I am using your library to compute MFCC features that will be used to train a neural network to perform speech recognition. When I've searched for data augmentation options, one very popular is the one named VTLP (Vocal tract length perturbation), which basically consists of warping frequency axis by a random factor.

I am wondering, how difficult is to implement this augmentation in your code (I am supposing that this warping should be done right before the mfcc extraction, but I am still not sure)?

xnio94 commented 4 years ago

you can use nlpaug package pip install nlpaug numpy matplotlib python-dotenv import nlpaug.augmenter.audio as ag aug = ag.VtlpAug(sampling_rate) augmented_data = aug.augment(data)