about the running speed of gpu parallel aggregation and subtraction

hszhao / SAN

Exploring Self-attention for Image Recognition, CVPR2020.

MIT License

747 stars 133 forks source link

Hi, @hszhao, Thanks for this great work.

I have tested the aggregation and subtraction scripts in the folder of /lib/sa/functions/, from my setup as follows:

cuda/10.0.130
cupy-cuda100-7.7.0
python 3.7

I find it takes around 10mins to finish. Here is the log:

$python subtraction_refpad.py
test case passed
567.34s

Is it the same level of time you have taken. Cause I think the size of the input blocks is really small [2, 8, 5, 5], it looks pretty weird to take 10mins to finish.

Is there anything else that need to be clarified about my setup, please let me know.

hszhao / SAN

about the running speed of gpu parallel aggregation and subtraction #10