Open mys007 opened 4 years ago
This is not entirely unexpected; scripting preserves actual loop semantics while tracing unrolls all loops (since tracing merely observing what happened when you ran your model on the example inputs). Generally the overhead from scripting is negligible, since tensor operations dominate wall time.
We do have work in progress to close this gap even in cases where overhead is important (models with lots of small tensor ops, models with lots of scalar math, etc.), but it's not fully landed yet.
Thanks for the reply! Indeed, the scripted module doesn't have the loop unrolled. But I'm wondering where does the 7% running time overhead come from, as the branch condition is deterministic and don't involve any Tensor operations one has to synchronize for... I'm not sure operation fusion in the upcoming work as you've suggested will address this?
🐛 Bug
Calling a traced module in a for-loop with constant number of iterations from a scripted module is slower than tracing, at least with CUDA.
To Reproduce
Running it on RTX 2080 Ti gives me:
The scripted model is slower and has a less uniform running time.
Expected behavior
Tracing and scripting should produce comparable running times.
Environment
PyTorch version: 1.4.0 Is debug build: No CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.2 LTS GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 CMake version: version 3.10.2
Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce RTX 2080 Ti GPU 2: GeForce RTX 2080 Ti GPU 3: GeForce RTX 2080 Ti
Nvidia driver version: 440.33.01 cuDNN version: Could not collect
Versions of relevant libraries: [pip] inferno-pytorch==0.3.1 [pip] numpy==1.16.2 [pip] pytorch-memlab==0.0.4 [pip] robust-loss-pytorch==0.0.2 [pip] torch==1.4.0 [pip] torch-dct==0.1.5 [pip] torchfile==0.1.0 [pip] torchvision==0.5.0 [conda] blas 1.0 mkl [conda] cuda100 1.0 0 pytorch [conda] inferno-pytorch 0.3.1 dev_0
[conda] mkl 2019.1 144
[conda] mkl_fft 1.0.10 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.4.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] pytorch-memlab 0.0.4 pypi_0 pypi
[conda] robust-loss-pytorch 0.0.2 pypi_0 pypi
[conda] torch-dct 0.1.5 pypi_0 pypi
[conda] torchfile 0.1.0 pypi_0 pypi
[conda] torchvision 0.5.0 py37_cu101 pytorch
cc @suo