pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.53k stars 651 forks source link

KALDI:apply-cmvn-sliding #535

Closed wanglong001 closed 4 years ago

wanglong001 commented 4 years ago

🚀 Feature

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

Motivation

Acoustic features are extracted based on Kaldi. I want to use torchaudio instead, but there is no cmvn, I wrote a torch version of cmvn according to Kaldi

stonelazy commented 3 years ago

Dear @wanglong001 I fail to understand where exactly we would be making use of torchaudio.transforms.SlidingWindowCmn is it to normalize the output of STFT/MFCC at a window level ?
Would it be possible for you to explain on this ? Am not familiar with Kaldi.

wanglong001 commented 3 years ago

Dear @wanglong001 I fail to understand where exactly we would be making use of torchaudio.transforms.SlidingWindowCmn is it to normalize the output of STFT/MFCC at a window level ? Would it be possible for you to explain on this ? Am not familiar with Kaldi.

Yes, normalize the output of cepstral (STFT/MFCC...) at a window level, Mainly to reduce the impact of environmental noise.

https://kaldi-asr.org/doc/apply-cmvn-sliding_8cc.html