Open fnobis opened 1 year ago
Strange, can you make a MWE that reproduces this with a simple torch model?
That is indeed interesting. I was working with using line_profiler on PyTorch models a few weeks ago and I didn't notice this issue, but I could easily have been doing something wrong.
Also, in addition to an MWE, can you give us your platform (Windows vs Mac vs Linux), Python version, and line_profiler version? This may be related to #210, which is fixed if you install from this git repo but hasn't been officially released yet. So, maybe running your code with the version of line_profiler in this repo could help.
You need to make sure to set CUDA_LAUNCH_BLOCKING=1
for accurate results, otherwise the cuda kernels are running async and all the time will accumulate at the wrong line whenever there's a cuda sync happening.
I run a multiline statment, calling onto a PyTorch network which runs in about 50ms. When measuring this with line_profiler, the measured time is strangely low. When putting the code in one line, the measurement seems to be ok. In the multiline case, the number of hits is given as 2 even though the model and all other lines in the code are only run once.
Wrong time calculated, multiline
Correct time calculated and hits number, single line