vene / sparse-structured-attention

Sparse and structured neural attention mechanisms
BSD 3-Clause "New" or "Revised" License
224 stars 38 forks source link

Hi, is there any faster gpu-version? #2

Open happygds opened 6 years ago

happygds commented 6 years ago

Hi, I find it was too slow when I ran the code, is there any faster gpu-version ?

vene commented 6 years ago

Hi,

We did use the GPU in our experiments. The SparseMAP layer itself cannot run on the GPU because it relies on external C++ code. What I recommend is: (i) Run the first part of your model on GPU. (ii) copy the potentials to CPU. (iii) Run SparseMAP on CPU. (iv) copy back and finish. This worked well for us and-- in the case of ESIM-- with minimal slowdown.

Hope this helps!

PkuRainBow commented 6 years ago

@vene Really interesting. I am also wondering whether your sparse-attention support 4D tensors input like a mini-batch of images.

If the code can not run on GPU, it can become super-slow for image processing.......