sigsep / open-unmix-pytorch

Open-Unmix - Music Source Separation for PyTorch
https://sigsep.github.io/open-unmix/
MIT License
1.27k stars 191 forks source link

A little confused while using istft in test.py #64

Closed mstfc closed 4 years ago

mstfc commented 4 years ago

Hi sirs,

Sorry to bother. This is not a bug, but I don't know whom I can ask.

I have a question about using istft() in test.py. def istft(X, rate=44100, n_fft=4096, n_hopsize=1024): t, audio = scipy.signal.istft( X / (n_fft / 2), rate, nperseg=n_fft, noverlap=n_fft - n_hopsize, boundary=True ) return audio

Why does the input data "X" need to be divided by "(n_fft / 2)" ? What is the purpose of it?

Thanks for your help. mstfc

faroit commented 4 years ago

Hi, @mstfc thanks for you interest. torch.stft and scipy.signal.stft do not use the same window normalization. If we wouldn't apply this factor before inverse transform the amplitude of the output would be too large.

In the next version of open-unmix (dev-branch) we will use torchaudio.istft which doesn't require such a factor. See https://github.com/sigsep/open-unmix-pytorch/blob/dev/openunmix/model.py

Also see: https://github.com/faroit/stft-istft-experiments for more transforms between framework with respect to STFT/ISTFT

mstfc commented 4 years ago

Hi @faroit, Thank you so much for the quick response. You mean the hann window generation is different or something else? I cannot understand the term "do not use the same window normalization".

faroit commented 4 years ago

I can't remember the exact difference but I think it was the same as scipy vs librosa https://gist.github.com/746e572232be36f3bd462749fb1796da

@bmcfee

bmcfee commented 4 years ago

If it's just the constant factor n_fft/2, that's an artifact of the DFT itself, not the windowing. The default FFT implementation in numpy/scipy (and, upstream, librosa) does not normalize the forward transform, so all amplitudes scale up by N. The inverse transform then needs to divide by N to preserve energy. It's possible that scipy's stft/istft pair normalizes differently, but this is separate from window normalization, which is used to undo modulation effects induced by overlapping windows in the istft.

mstfc commented 4 years ago

Hi @faroit , @bmcfee , Very clear, thanks a lot!