Closed Changlin-Lee closed 1 year ago
What is the inference time when you run our provided benchmarks/speed_parameters.py with 1280x720 input size?
I think below codes should also be added to turn on the cudnn benchmark.
if torch.cuda.is_available():
torch.backends.cudnn.enabled = True
torch.backends.cudnn.benchmark = True
Hi, I tried to add cudnn codes you mentioned above, and I tried to use the speed_parameters.py in the github. The result is still be : " Time: 0.100s Parameters: 2.80M " When I tried on a new V100 server( which owns more CPU cores): The speed change to be "Time: 0.048s"
I think the speed of IFRNet may be highly influenced by the CPU performance
Yes, the CPU will also affect the inference time. Also, when you test running speed of IFRNet, you should guarantee that there is no other processes which can consume CPU, GPU or bandwidth resources.
If set torch.set_num_threads(1)
, I can get the same runtime test result on V100.
Hi, I try to evaluate the running time of IFRNet-S. The time testing on my V100 server is 0.131 s with 1x3x720x1280 input, which is 10x longer than the data mentioned in your paper. The testing code is shown below(modified with demo_2x.py): import os import numpy as np import torch from models.IFRNet_S import Model import time
model = Model().cuda().eval() model.load_state_dict(torch.load('./checkpoints/IFRNet_small/IFRNet_S_Vimeo90K.pth'))
with torch.no_grad(): inp_size = [1, 3, 720, 1280] inps = [torch.Tensor(*inpsize).cuda() for in range(2)] embt = torch.tensor(1/2).view(1, 1, 1, 1).float().cuda()
warm up
The data using torch profiler is shown below:
volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148... 0.00% 0.000us 0.00% 0.000us 0.000us 9.754ms 41.53% 9.754ms 304.812us 32
aten::conv_transpose2d 0.00% 32.000us 0.02% 649.000us 162.250us 0.000us 0.00% 4.070ms 1.018ms 4
aten::cudnn_convolutiontranspose 0.01% 184.000us 0.02% 421.000us 105.250us 3.901ms 16.61% 3.901ms 975.250us 4
aten::copy 0.01% 295.000us 0.47% 12.158ms 715.176us 3.300ms 14.05% 3.300ms 194.118us 17
aten::to 0.00% 68.000us 0.46% 11.992ms 1.499ms 0.000us 0.00% 3.134ms 391.750us 8
Self CPU time total: 2.604s Self CUDA time total: 23.485ms