Closed dorpxam closed 1 year ago
Sorry for that, this is a misunderstanding of the GPU internal work. By calling a GPU synchronization on each stage, I got a better and coherent benchmark report ! This issue can be close.
-------------------- Training --------------------
Time: 0.022357 seconds in "STFT" stage.
Time: 0.624558 seconds in "Band Split" stage.
Time: 12.844367 seconds in "Transformers" stage.
Time: 0.244779 seconds in "Mask Estimators" stage.
Time: 0.006015 seconds in "ISTFT" stage.
--------------------------------------------------
Time: 0.290139 seconds in "Loss Function" stage.
Time: 50.656937 seconds in "Loss Backward" stage.
--------------------------------------------------
Time: 65.572889 seconds for the whole process.
Hi, let me introduce this little benchmark of the model running on GPU (
device='cuda'
):The model specifications follow the original paper:
The tensors for testing are initialized using:
The benchmark do not include the einops operations, nor the other tensor manipulation but bounds the stages of the model like that:
The
Benchmark
class is a pretty trivial one:Conclusion:
66% of the time in the model is lost in the
torch.istft
process while thetorch.stft
is not slow at all.Am I the only one to notice this?
Edit:
Wrong conclusion, see next message.