hellozhuo / pidinet

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).
Other
445 stars 69 forks source link

Why the FPS of pidinet_converted model tested on my own Geforce RTX 2080Ti is only 92.9? #13

Closed liuzhidemaomao closed 2 years ago

liuzhidemaomao commented 2 years ago

Hello~ The FPS of pidinet_tiny_converted and pidinet_small_converted is higher than your results when tested on my own Geforce RTX 2080Ti, which are 203.5FPS and 162.9FPS respectively. However, the FPS of pidinet_converted is only 92.9FPS, which is lower than 96FPS. And I have another question, why the speed test code do not use the torch.cuda.synchronize() to get a more resonable result?

zhuoinoulu commented 2 years ago

Hello~ The FPS of pidinet_tiny_converted and pidinet_small_converted is higher than your results when tested on my own Geforce RTX 2080Ti, which are 203.5FPS and 162.9FPS respectively. However, the FPS of pidinet_converted is only 92.9FPS, which is lower than 96FPS. And I have another question, why the speed test code do not use the torch.cuda.synchronize() to get a more resonable result?

Hi, thanks for the question. Actually we also got different FPS on different runs even in the same machine, but the results are similar. You may change the "-j" to 2 or higher for a better cpu process. And thank for the suggestion, we updated our code with torch.cuda.synchronize(), see :https://github.com/zhuoinoulu/pidinet/blob/c70859767b3b379cae062371501fe67ed5804a6a/throughput.py#L110

The wall time of processing the whole test set keeps the same, but the way of recording should be more reasonable as you mentioned.

liuzhidemaomao commented 2 years ago

Hello~ The FPS of pidinet_tiny_converted and pidinet_small_converted is higher than your results when tested on my own Geforce RTX 2080Ti, which are 203.5FPS and 162.9FPS respectively. However, the FPS of pidinet_converted is only 92.9FPS, which is lower than 96FPS. And I have another question, why the speed test code do not use the torch.cuda.synchronize() to get a more resonable result?

Hi, thanks for the question. Actually we also got different FPS on different runs even in the same machine, but the results are similar. You may change the "-j" to 2 or higher for a better cpu process. And thank for the suggestion, we updated our code with torch.cuda.synchronize(), see :

https://github.com/zhuoinoulu/pidinet/blob/c70859767b3b379cae062371501fe67ed5804a6a/throughput.py#L110

The wall time of processing the whole test set keeps the same, but the way of recording should be more reasonable as you mentioned.

Thanks for your replying.