irebai / SpecAugment_KALDI

A KALDI/C++ implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
14 stars 6 forks source link

SpecAugment with KALDI

A C++ Implementation of SpecAugment paper within KALDI toolkit: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment is a state of the art data augmentation approach for speech recognition.

SpecAugment idea aims to construct an augmentation policy that acts on the log mel spectrogram directly. Three transformations are proposed by the paper's authors, which are: time warping, frequency perturbation, and time perturbation.

We implemented the frequency and time masking transforms using Kaldi, which is a state-of-the-art automatic speech recognition (ASR) toolkit, containing almost any algorithm currently used in ASR systems.

To use:

  1. Replace the files "feature-common.h, feature-common-inl.h,, feature-mfcc.h" in the kaldi/src/feat directory by those in our repository

  2. Copy the folder src/featbin-mask in the kaldi/src directory

  3. Add featbin-mask into the list of "SUBDIRS" in kaldi/src/Makefile file


SUBDIRS = featbin feat featbin-mask
  1. Run make to re-compile the modified functions

After the install step runs, you should add the kaldi/src/featbin-mask directory to the PATH variable.

  1. Check out compute-mfcc-feats-masks function for the added parameters.

Augmentations parameters

Time Mask

Frequency Mask


To activate or desactivate a parameter, you need only to put the correponding mask to zero: eg., --time-mask=0; --frequency-mask=0
