I have a question about "flash-attn". In your paper, you use the Nvidia 32GB V100 GPU to implement experiments, but the flash-atten is not supported by V100. How do you solve it? Since I only have the V100, this is very frustrating for me.
Sorry for the confusion. We tried flash attention at some point to reduce the computation overhead during testing, but do not use it at the current version of the code base. You can simply commented out this line and remove MultiheadFlashAttention in this line.
thanks for your good job!!!!
I have a question about "flash-attn". In your paper, you use the Nvidia 32GB V100 GPU to implement experiments, but the flash-atten is not supported by V100. How do you solve it? Since I only have the V100, this is very frustrating for me.