hszhao / SAN

Exploring Self-attention for Image Recognition, CVPR2020.
MIT License
747 stars 133 forks source link

about the running speed of gpu parallel aggregation and subtraction #10

Closed KnightOfTheMoonlight closed 4 years ago

KnightOfTheMoonlight commented 4 years ago

Hi, @hszhao, Thanks for this great work.

I have tested the aggregation and subtraction scripts in the folder of /lib/sa/functions/, from my setup as follows:

cuda/10.0.130
cupy-cuda100-7.7.0
python 3.7

I find it takes around 10mins to finish. Here is the log:

$python subtraction_refpad.py
test case passed
567.34s

Is it the same level of time you have taken. Cause I think the size of the input blocks is really small [2, 8, 5, 5], it looks pretty weird to take 10mins to finish.

Is there anything else that need to be clarified about my setup, please let me know.

KnightOfTheMoonlight commented 4 years ago

I just found that when I am using telsa v100, the cupy library runs much slower than it should be. When I run these scripts on titan xp or telsa p100, the running speed is much reasonable only 10s.