Closed GreyZzzzzzXh closed 3 years ago
The major bottleneck in our model is the CPU/GPU transfer of 4 input frames, which takes a major amount of time, and calling cuda synchronous will also wait for all the transfer to complete. So for all the baseline models (including ours), we only measure the forward transfer time and ignore the cuda transfer time, which gets reflected in the cuda.synchronous calls. You are right that this is not an accurate reflection of the total computation time. More details on our bench marking process is provided in our paper.
Thanks for your reply.
I call the first torch.cuda.synchronous()
after images = [img_.cuda() for img_ in images]
, it will wait for all data to be transferred to GPU, so the transfer time will not be recorded, I think.
What I want to say is that CUDA calls are asynchronously, so only the time()
function cannot get the correct inference time.
https://discuss.pytorch.org/t/how-to-get-forward-time/25158
Feel free to point out if I am wrong :)
You are right. Thanks for bringing this to our notice. We will correct the benchmarking script and update the inference times in the paper.
Hi, could you please share the pretrained model? The current links seem to be unavailable. Thanks a lot!
Hi, thanks for your interesting work! I tested the inference time on vimeo90K_septuplet using your script, and i got the time is 0.004 s. It seems too fast? I modified the code and tested again, and the time I got is 0.195 s. So, I wonder how the time in your paper was tested?