Open StefanIsSmart opened 6 months ago
You use the for-loop for multi-head. (Time x The Number of Heads) And also use the for-loop for Graph attention. (Time x The Number of Graph)
x
It will be very slow.
Is there any other way to solve that point ?
Not sure yet! I believe the best way is to 1) Rewrite it to classic mutlihead attentinon 2) Rewrite "1)" to PyG-like attention
You use the for-loop for multi-head. (Time
x
The Number of Heads) And also use the for-loop for Graph attention. (Timex
The Number of Graph)It will be very slow.
Is there any other way to solve that point ?