noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
540 stars 61 forks source link

How to get the correct inference speed of the model #38

Closed lnSong closed 1 year ago

lnSong commented 1 year ago

I am running the following code on Ubuntu, under TITAN V, and I get an inference speed of 13.2ms, which is much different from the results in your paper, is this due to the code or the hardware?

import numpy as np
import torch
from torch.backends import cudnn
import tqdm
cudnn.benchmark = True

device = 'cuda:6'
encoder = encoder.to(device)
decoder = decoder.to(device)
repetitions = 300
dummy_input = torch.rand(1, 3,192,640).to(device)

print('warm up ...\n')
with torch.no_grad():
    for _ in range(100):
        _ = decoder(encoder(dummy_input))
torch.cuda.synchronize()

starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)

timings = np.zeros((repetitions, 1))

print('testing ...\n')
with torch.no_grad():
    for rep in tqdm.tqdm(range(repetitions)):
        starter.record()
        _ = decoder(encoder(dummy_input))
        ender.record()
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender) 
        timings[rep] = curr_time

avg = timings.sum()/repetitions
print('\navg={}\n'.format(avg))
noahzn commented 1 year ago

Hi, what batchsize did you use? Is this the speed of Lite-Mono-8m? did you evaluate the speed of Monodepth2 with the same code?

lnSong commented 1 year ago

Batchsize = 1, in the code ,it is dummy_input = torch.rand(1, 3,192,640) This is the speed of Lite-Mono. The speed of Monodepth2 is 6ms.

noahzn commented 1 year ago

If you check the graphs of speed evaluation in this repo, you could find that when batchsize=1, Monodepth2 is 7.1ms, and Lite-Mono is 9.1ms.

What CUDA version are you using? Different CUDA versions might have different performances. Also some additional points:

  1. set the model to eval mode.
  2. set cudnn.benchmark = False and observe if it affects the speed.
  3. try different batchsizes.
  4. Both the hardware and the software (the implementation of your speed code, software versions,) may yield different results of speed.
lnSong commented 1 year ago

I have CUDA Version: 11.4. Thank you very much for your answer, I will try again what you said.

noahzn commented 1 year ago

I am now closing this issue. If you have more questions please feel free to reopen this issue or create a new one.

lnSong commented 1 year ago

I tested the speed of monodepth2, R-MSFM3, and Lite-Mono on the RTX 3090. Why was Lite-Mono the slowest? monodepth2 is 4.12s, R-MSFM3 is 7.25s,Lite-Mono is 9.44s

noahzn commented 1 year ago

I need more information. Do you use the same code as posted in this issue? What batchsize do you use? Have you tried the points that I told you, i.e. set the model to eval mode. set cudnn.benchmark = False and observe if it affects the speed. try different batchsizes.

lnSong commented 1 year ago

Thank you very much! When i increase the batchsize, the result will be better

noahzn commented 1 year ago

Yes, the batchsize can affect the result. Also, your code for speed evaluation is different from mine. In my evaluation code there are lines to compute the inference time. Please see this. Then, t2-t1 is the time for a batch. You need to do a warmup and then cumulate all the batches.

If you have further questions feel free to contact me. Good luck!