SpecAugment is a state of the art data augmentation approach for speech recognition.
The paper's authors did not publish code that I could find and their implementation was in TensorFlow. We implemented all three SpecAugment transforms using Pytorch, torchaudio, and fastai / fastai-audio.
install.sh
(I recommend using a unique conda
env for the project)After the install script runs, you should have a torchaudio
folder in your project folder.
Time Warp
Time Mask
Frequency Mask
Combined:
The Time Warp augmentation relies on Tensorflow-specific functionality not supported in Pytorch. We implemented supporting functions for this augmentation in SparseImageWarp.ipynb
. You do not need to look at this notebook to use the augmentations. But the Time Warp augmentation depends on code exposed in the SparseImageWarp
notebook.
Let's be friends!