zhr1201 / deep-clustering

A tensorflow implementation for Deep clustering: Discriminative embeddings for segmentation and separation
135 stars 70 forks source link

Scaling in STFT #22

Closed maxammann closed 5 years ago

maxammann commented 5 years ago

I noticed that you scale the windows in the STFT with 0.5016: https://github.com/zhr1201/deep-clustering/blob/master/audio_test.py#L229

Is this related to COLA (https://gauss256.github.io/blog/cola.html)? If not how did you decide on this constant?

zhr1201 commented 5 years ago

Yeah, I used overlap and add to reconstruct the signal. I used 0.5016 just because if I didn't use it, the signal will be 1 / 0.5016 times larger than the original signal.

maxammann commented 5 years ago

Oke I understand, do you remember where that constant comes from? I could imagine that it depends in the window. In the OLA method you divide by the window because depending on the window you divide by a constant C.

zhr1201 commented 5 years ago

I didn't use a formulation to compute the constant. First, I reconstructed a piece of sample signal using OLA without processing. Then the scaling factor could be decided by comparing the input and output signal directly. So there is no other magic in it, just compute the constant C such that C * output == input for one audio sample, then the constant can hopefully work for other samples.

maxammann commented 5 years ago

Okidoki :)

Decided to use this istft which also worked well: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.istft.html