KALDI：apply-cmvn-sliding

pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

https://pytorch.org/audio

BSD 2-Clause "Simplified" License

2.53k stars 651 forks source link

KALDI：apply-cmvn-sliding #535

Closed wanglong001 closed 4 years ago

wanglong001 commented 4 years ago

🚀 Feature

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

Motivation

Acoustic features are extracted based on Kaldi. I want to use torchaudio instead, but there is no cmvn， I wrote a torch version of cmvn according to Kaldi

stonelazy commented 3 years ago

Dear @wanglong001 I fail to understand where exactly we would be making use of torchaudio.transforms.SlidingWindowCmn is it to normalize the output of STFT/MFCC at a window level ?
Would it be possible for you to explain on this ? Am not familiar with Kaldi.

wanglong001 commented 3 years ago

Dear @wanglong001 I fail to understand where exactly we would be making use of torchaudio.transforms.SlidingWindowCmn is it to normalize the output of STFT/MFCC at a window level ? Would it be possible for you to explain on this ? Am not familiar with Kaldi.

Yes, normalize the output of cepstral (STFT/MFCC...) at a window level, Mainly to reduce the impact of environmental noise.

https://kaldi-asr.org/doc/apply-cmvn-sliding_8cc.html