Closed xy-guo closed 4 years ago
Hi, thanks for your interest in our work. I think there would be some differences when enabling synchronization, but since many previous works in this field reported their inference time without synchronization (e.g., PSMNet and GA-Net), we just simply follow this setting.
However, we do believe that the inference time is implementation and hardware-dependent, and thus we have intergrated some representative methods in a same framework and used the same inference setting and hardware for efficiency comparison (e.g., Table 2 in our paper).
I modified the code as follows:
torch.cuda.synchronize()
time_start = time.perf_counter()
for i in range(100):
with torch.no_grad():
# torch.cuda.synchronize()
print(i, left.shape, right.shape)
pred_disp = aanet(left, right)[-1] # [B, H, W]
torch.cuda.synchronize()
inference_time += time.perf_counter() - time_start
The average inference time is still 123ms, so the inference time difference is not due to synchronization overhead. I think there may exist some errors in your reported time results.
Please note that the reported time is measured without synchronization to be consistent with previous methods.
But I didn't find any time profiling code in the previous released codes... As you know, the faster your model runs, the time measuring error becomes larger without synchronization.
FYI, the codes of PSMNet, GA-Net and HD3 are all publicly available.
Thank you for your great work! I just tried your code and add --count_time to try to see the speed of the model. However, I found there is no torch.cuda.synchronize() after the running of the model. Since pytorch runs in an async way, I wonder whether this will affect the final measuring result.
I just tried to run aanet+ on a 2060s card, the speed is 68ms without synchronization, while the speed is 123ms after adding cuda synchronization.
The code is modified as follows in inference.py.