the question about "flash-attn"

nickgkan / 3d_diffuser_actor

Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"

https://3d-diffuser-actor.github.io/

MIT License

159 stars 16 forks source link

the question about "flash-attn" #27

Closed csufangyu closed 2 months ago

csufangyu commented 2 months ago

thanks for your good job!!!!

I have a question about "flash-attn". In your paper, you use the Nvidia 32GB V100 GPU to implement experiments, but the flash-atten is not supported by V100. How do you solve it? Since I only have the V100, this is very frustrating for me.

twke18 commented 2 months ago

Hi,

Sorry for the confusion. We tried flash attention at some point to reduce the computation overhead during testing, but do not use it at the current version of the code base. You can simply commented out this line and remove MultiheadFlashAttention in this line.