SpecAugment is a state of the art data augmentation approach for speech recognition.
SpecAugment idea aims to construct an augmentation policy that acts on the log mel spectrogram directly. Three transformations are proposed by the paper's authors, which are: time warping, frequency perturbation, and time perturbation.
We implemented the frequency and time masking transforms using Kaldi, which is a state-of-the-art automatic speech recognition (ASR) toolkit, containing almost any algorithm currently used in ASR systems.
Replace the files "feature-common.h, feature-common-inl.h, feature-mfcc.cc, feature-mfcc.h" in the kaldi/src/feat directory by those in our repository
Copy the folder src/featbin-mask in the kaldi/src directory
Add featbin-mask into the list of "SUBDIRS" in kaldi/src/Makefile file
Example:
SUBDIRS = featbin feat featbin-mask
After the install step runs, you should add the kaldi/src/featbin-mask directory to the PATH variable.
Time Mask
Frequency Mask
To activate or desactivate a parameter, you need only to put the correponding mask to zero: eg., --time-mask=0; --frequency-mask=0