Closed maxammann closed 5 years ago
Yeah, I used overlap and add to reconstruct the signal. I used 0.5016 just because if I didn't use it, the signal will be 1 / 0.5016 times larger than the original signal.
Oke I understand, do you remember where that constant comes from? I could imagine that it depends in the window. In the OLA method you divide by the window because depending on the window you divide by a constant C.
I didn't use a formulation to compute the constant. First, I reconstructed a piece of sample signal using OLA without processing. Then the scaling factor could be decided by comparing the input and output signal directly. So there is no other magic in it, just compute the constant C such that C * output == input for one audio sample, then the constant can hopefully work for other samples.
Okidoki :)
Decided to use this istft which also worked well: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.istft.html
I noticed that you scale the windows in the STFT with
0.5016
: https://github.com/zhr1201/deep-clustering/blob/master/audio_test.py#L229Is this related to COLA (https://gauss256.github.io/blog/cola.html)? If not how did you decide on this constant?