Closed mstfc closed 4 years ago
Hi, @mstfc thanks for you interest. torch.stft
and scipy.signal.stft
do not use the same window normalization. If we wouldn't apply this factor before inverse transform the amplitude of the output would be too large.
In the next version of open-unmix (dev
-branch) we will use torchaudio.istft which doesn't require such a factor. See https://github.com/sigsep/open-unmix-pytorch/blob/dev/openunmix/model.py
Also see: https://github.com/faroit/stft-istft-experiments for more transforms between framework with respect to STFT/ISTFT
Hi @faroit, Thank you so much for the quick response. You mean the hann window generation is different or something else? I cannot understand the term "do not use the same window normalization".
I can't remember the exact difference but I think it was the same as scipy vs librosa https://gist.github.com/746e572232be36f3bd462749fb1796da
@bmcfee
If it's just the constant factor n_fft/2
, that's an artifact of the DFT itself, not the windowing. The default FFT implementation in numpy/scipy (and, upstream, librosa) does not normalize the forward transform, so all amplitudes scale up by N
. The inverse transform then needs to divide by N
to preserve energy. It's possible that scipy's stft/istft pair normalizes differently, but this is separate from window normalization, which is used to undo modulation effects induced by overlapping windows in the istft.
Hi @faroit , @bmcfee , Very clear, thanks a lot!
Hi sirs,
Sorry to bother. This is not a bug, but I don't know whom I can ask.
I have a question about using istft() in test.py.
def istft(X, rate=44100, n_fft=4096, n_hopsize=1024): t, audio = scipy.signal.istft( X / (n_fft / 2), rate, nperseg=n_fft, noverlap=n_fft - n_hopsize, boundary=True ) return audio
Why does the input data "X" need to be divided by "(n_fft / 2)" ? What is the purpose of it?
Thanks for your help. mstfc