I am using your library to compute MFCC features that will be used to train a neural network to perform speech recognition. When I've searched for data augmentation options, one very popular is the one named VTLP (Vocal tract length perturbation), which basically consists of warping frequency axis by a random factor.
I am wondering, how difficult is to implement this augmentation in your code (I am supposing that this warping should be done right before the mfcc extraction, but I am still not sure)?
you can use nlpaug package
pip install nlpaug numpy matplotlib python-dotenvimport nlpaug.augmenter.audio as agaug = ag.VtlpAug(sampling_rate)augmented_data = aug.augment(data)
Hi,
I am using your library to compute MFCC features that will be used to train a neural network to perform speech recognition. When I've searched for data augmentation options, one very popular is the one named VTLP (Vocal tract length perturbation), which basically consists of warping frequency axis by a random factor.
I am wondering, how difficult is to implement this augmentation in your code (I am supposing that this warping should be done right before the mfcc extraction, but I am still not sure)?