Closed wanglong001 closed 4 years ago
Dear @wanglong001 I fail to understand where exactly we would be making use of torchaudio.transforms.SlidingWindowCmn
is it to normalize the output of STFT/MFCC at a window level ?
Would it be possible for you to explain on this ? Am not familiar with Kaldi.
Dear @wanglong001 I fail to understand where exactly we would be making use of
torchaudio.transforms.SlidingWindowCmn
is it to normalize the output of STFT/MFCC at a window level ? Would it be possible for you to explain on this ? Am not familiar with Kaldi.
Yes, normalize the output of cepstral (STFT/MFCC...) at a window level, Mainly to reduce the impact of environmental noise.
🚀 Feature
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.
Motivation
Acoustic features are extracted based on Kaldi. I want to use torchaudio instead, but there is no cmvn, I wrote a torch version of cmvn according to Kaldi