Open keunwoochoi opened 5 years ago
Great. Did you did test the reconstruction error?
I think line 33-40 can be replaced with transpose convolution. (I've done that in my private repository, but also didn't make a PR to PyTorch since it's not written in C like torch.stft
.)
Probably I could open-source my private implementation of inverse-STFT after some code refactoring. That is also based on librosa implementation of istft. Will that be helpful for us?
reconstruction error
@faroit Could you possibly test by yourself? Because I did, and the error was 0
. Not sure if it's for real..
transpose convolution
@seungwonpark Yes. I think there's pros/cons though. torch.rfft
would be a complexity of n log n
in computation while transpose convolution would be n ** 2
. But transpose conv layer might outspeed due to a better parallelization -- although the current approach can be also framed (with reshaping) first to remove the loop, and probably the torch.rfft
is only executed for a couple of times.
That said, we can simply time it and compare :) so yes, please share it! Even better if you can measure them.
@keunwoochoi Thanks for pointing out the difference! I'll let you know when I'm ready to share.
Here it is. I've also measured the time performance, and looks like your implementation is much faster! https://github.com/seungwonpark/istft-pytorch
By the way, I can't find out the actual difference between them. The major difference is that mine uses ifft
and yours uses irfft
. So, I've tried using irfft
instead of ifft
in my implementation, but no significant difference was found.
@keunwoochoi cool, I tested yours in a small experiment where I run STFT of one library (librosa, scipy, tf, torch) against ISTFT of another and measured the reconstruction RMSE.
ISTFT --> ISTFT RMSE
=================================
test_torch --> test_torch 0.21650636
test_torch --> test_scipy 1.08139886e-07
test_torch --> test_librosa 1.1001805855710549e-07
test_torch --> test_tf 1.9412305e-06
test_scipy --> test_torch 0.21650635
test_scipy --> test_scipy 7.398381e-08
test_scipy --> test_librosa 7.647041059862848e-08
test_scipy --> test_tf 1.9329398e-06
test_librosa --> test_torch 0.21650635
test_librosa --> test_scipy 2.0918270288343244e-16
test_librosa --> test_librosa 2.294981695912998e-16
test_librosa --> test_tf 1.9328936e-06
test_tf --> test_torch 0.21650644
test_tf --> test_scipy 4.216027e-06
test_tf --> test_librosa 4.220019682546531e-06
test_tf --> test_tf 4.694827e-06
i think this is because of the combination of half overlap (you used //4) the hann
window and the not existing normalization. Anyway this is easy to fix...
@f0k btw. do you know a good way to write a unit test for STFT/ISTFT errors due to windowing when masking was applied before the ISTFT, e.g. normalizing using the window instead of the squared window....? I was always wondering how to catch these....
It’s probably due to the boundary. I don’t pre-pad, if you ignore the first and last half-windows in the eval it should be much better for the moment.
it would be interesting to compare with https://github.com/wslihgt/pyfasst/tree/master/pyfasst/tftransforms I have used as a baseline the stft implementation of JL Durrieu which uses numpy. I think madmom (JKU Linz audio framework) also uses something similar. oh btw you can find some tentative implementation of cqt using nsgt. but it's not reliable. and I don't know how you could port it in pytorch
it would be interesting to compare with https://github.com/wslihgt/pyfasst/tree/master/pyfasst/tftransforms
cool. There also https://github.com/nils-werner/stft which I used for quite a while. I think we should stick to scipy.signal and librosa since they are heavily tested. However, I'm super interested to spot the differences and discuss which one has the best reconstruction performance.. but maybe thats out of scope for this repo ;-)
Hi, recently we've found out that our previous time profiling code in my repo for comparing Choi's implementation(rfft) and mine(deconvolution with parallel ops) was wrong, and fixed it.
And the new conclusion is: deconvolution is much faster! Please check https://github.com/seungwonpark/istft-pytorch.
I think we should close this then and move this initiative over to https://github.com/pytorch/pytorch/issues/3775
@seungwonpark are you willing to help out over there?
Yeah, we could close it, maybe re-open once we wanna work on a wrapper for it.
@faroit Hey, so after closing it I think we sort of lost the track. I don't think it's actually necessary to wait for the pytorch istft. There are also some implementation details (e.g., deconv like @seungwonpark's vs rfft as I've done), but maybe we have our own tentative implementation with a less-tentative API? Seems like critical issues were resolved (https://github.com/faroit/stft-istft-experiments/issues/1), and we can follow the API of STFT
https://github.com/keunwoochoi/torchaudio-contrib/blob/master/torchaudio_contrib/layers.py#L35.
InverseSTFT
we need more tests for STFT
including to check if the stft result is correct. Okay, what about we open up an PR with an API proposal to address ISTFT in https://github.com/pytorch/pytorch/issues/3775. Then we could
opening up a PR
You mean a PR here? (Yes let's do it) Or in pytorch? (Would they care about accepting an API without an implementation? Also, we can easily change it later, at least in this repo)
On pytorch.
Yes we could just add this ISTFT but then it's rather hacky (using conv layers) and would mean it won't be that easy to make it clean and readable. And once we reached that someone would have to start from scratch using libtorch, thus it wouldn't particularly sustainable ;-)
Ok, I'm not really sure if they'd care an API only. Given the history, they've been adapting what had implemented outside of Pytorch quite actively (in other words, somehow blindly if it seems good), at least for the case of audio processing. So, how about we do it here, maybe
def istft(fft, hop, etc etc, mode='conv'):
"""wrapper for istft"""
def conv_istft(fft, hop, etc etc):
"""one of the options"""
?
It wouldn't block a follow-up dev while we refining it.
(continued)
..Or, BOTH?! Since we have control and flexibility here -- and the API for ISTFT
would be pretty much the same to that of STFT
, let me add a tentative one (or persuade @seungwonpark to add one here :) while asking Pytorch. @faroit Could you open a PR for that?
Minor clarification question: If the STFT coefficients are allowed to evolve & train via autograd (I assume they are, yes?), then is the idea of for this ISTFT module to be either
(a) sort of a 'trainable' ISTFT which may not remain a perfect inverse to the trainable STFT? i.e. it is independent from the STFT routine? (@seungwonpark's deconv version looks that way) or (b) a routine that is to be somehow mathematically tied to & the (trainable) STFT and always guaranteed to invert it 'perfectly'?
I'm guessing (a). Yes? Mainly because I have no idea how you'd do (b), and can think of plenty of uses for (a), and proper training could enforce arbitrary levels of inversion-quality if you wanted it.
I also guess it'd be (a) in the case, i.e., the provided ISTFT is a static, non-trained module.
Ohhh, static & non-trained. Interesting. Thanks for the clarification.
ISTFT is being developed in the main repo https://github.com/pytorch/audio/pull/130
https://gist.github.com/keunwoochoi/2f349e72cc941f6f10d4adf9b0d3f37e
It works, but I'll probably not gonna make a PR to Pytorch (https://github.com/pytorch/pytorch/issues/3775) because it's written in Python (
torch.stft
is written in C). Just for a future reference/usage.