Closed unemployed-denizen closed 2 years ago
Hi @unemployed-denizen, Thanks for this investigation. I haven't learned more about Pytorch/TensorFlow for years, so I'm not sure about this. Just checked Spectrogram in Audacity and spectrogram differences. Here is what I found.
From top to bottom: Old result, reference result, new result.
The shape of the new output matches the shape of the original one but there is still a large enough difference. Do you have any ideas to improve this?
Hi @unemployed-denizen, Thanks for this investigation. I haven't learned more about Pytorch/TensorFlow for years, so I'm not sure about this. Just checked Spectrogram in Audacity and spectrogram differences. Here is what I found.
- Spectrogram in Audacity:
From top to bottom: Old result, reference result, new result.
The shape of output:
- Old result: [2, 481280]
- Reference & new result: [2, 479832]
AVG Spec difference between each result and the original output:
- Old result: 1.8341962099075317
- New result: 0.0863308310508728
The shape of the new output matches the shape of the original one but there is still a large enough difference. Do you have any ideas to improve this?
For shape issue, it's all about how the padding of stft and istft works. I changed both of them to 'centre' so that the padding works in the right way. For computation accuracy, it might depend on the stft lib and framwork itself. Even if the procedure is 100% same, same input will give different output (the difference could be small but would not be 100% same). It would be hard to figure out what happends in the underlying computation process. By the way, I would recommend Demucs v3.0.4, since it is SOTA in 2022.
Hi @unemployed-denizen,
Sorry for the late reply. Actually, I don't have much passion for ML nowadays because ML is now more about scaling computing power. Yes, I'm using the latest version of Demucs. My evaluation shows that your result is better, but checking the spectrogram shows that my old result seems better at higher frequencies.
However, I'm going to merge this. Anyone who uses this library should be aware of this difference.
Again, thanks for your time.
Ref issue: https://github.com/tuan3w/spleeter-pytorch/issues/2
Main changes are the paddings in convolution layers (I have done the experiments on tf to make sure that the padding works). I also rewiring the last down conv layer, since it is not used in the tf version. Please check the examples to see if the quality of output is increased. I also edited test_estimator.py so that it fits the new version of pytorch, and the inference is also supporting GPU.