Changlin-Lee commented 2 years ago

Hi, I try to evaluate the running time of IFRNet-S. The time testing on my V100 server is 0.131 s with 1x3x720x1280 input, which is 10x longer than the data mentioned in your paper. The testing code is shown below（modified with demo_2x.py）: import os import numpy as np import torch from models.IFRNet_S import Model import time

model = Model().cuda().eval() model.load_state_dict(torch.load('./checkpoints/IFRNet_small/IFRNet_S_Vimeo90K.pth'))

with torch.no_grad(): inp_size = [1, 3, 720, 1280] inps = [torch.Tensor(*inpsize).cuda() for in range(2)] embt = torch.tensor(1/2).view(1, 1, 1, 1).float().cuda()

warm up

for i in range(5):
    model.inference(inps[0],inps[1],embt)
torch.cuda.synchronize()       
t1 = time.time()
for i in range(10):
    model.inference(inps[0],inps[1],embt)
torch.cuda.synchronize()       
t2 = time.time()        
print("inference time average:",(t2-t1)/10)

The data using torch profiler is shown below:

                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls

                                    model_inference         0.22%       5.682ms       100.00%        2.604s        2.604s       0.000us         0.00%      23.485ms      23.485ms             1  
                                  aten::convolution         0.01%     280.000us         0.20%       5.329ms     121.114us       0.000us         0.00%      16.184ms     367.818us            44  
                                 aten::_convolution         0.02%     610.000us         0.19%       5.049ms     114.750us       0.000us         0.00%      16.184ms     367.818us            44  
                                       aten::conv2d         0.01%     267.000us         0.19%       4.979ms     124.475us       0.000us         0.00%      12.114ms     302.850us            40  
                            aten::cudnn_convolution         0.07%       1.771ms         0.10%       2.689ms      67.225us      10.563ms        44.98%      10.563ms     264.075us            40

volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148... 0.00% 0.000us 0.00% 0.000us 0.000us 9.754ms 41.53% 9.754ms 304.812us 32
aten::conv_transpose2d 0.00% 32.000us 0.02% 649.000us 162.250us 0.000us 0.00% 4.070ms 1.018ms 4
aten::cudnn_convolutiontranspose 0.01% 184.000us 0.02% 421.000us 105.250us 3.901ms 16.61% 3.901ms 975.250us 4
aten::copy 0.01% 295.000us 0.47% 12.158ms 715.176us 3.300ms 14.05% 3.300ms 194.118us 17
aten::to 0.00% 68.000us 0.46% 11.992ms 1.499ms 0.000us 0.00% 3.134ms 391.750us 8

Self CPU time total: 2.604s Self CUDA time total: 23.485ms

ltkong218 commented 1 year ago

What is the inference time when you run our provided benchmarks/speed_parameters.py with 1280x720 input size?

I think below codes should also be added to turn on the cudnn benchmark.

if torch.cuda.is_available():
    torch.backends.cudnn.enabled = True
    torch.backends.cudnn.benchmark = True

Changlin-Lee commented 1 year ago

Hi， I tried to add cudnn codes you mentioned above, and I tried to use the speed_parameters.py in the github. The result is still be : " Time: 0.100s Parameters: 2.80M " When I tried on a new V100 server( which owns more CPU cores): The speed change to be "Time: 0.048s"

I think the speed of IFRNet may be highly influenced by the CPU performance

ltkong218 commented 1 year ago

Yes, the CPU will also affect the inference time. Also, when you test running speed of IFRNet, you should guarantee that there is no other processes which can consume CPU, GPU or bandwidth resources.

gehaocool commented 1 year ago

If set torch.set_num_threads(1), I can get the same runtime test result on V100.

ltkong218 / IFRNet

Large gap in my running time with data in the paper #18

warm up