tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)
Apache License 2.0
548 stars 48 forks source link

Correctness parameters #1

Closed cogumbreiro closed 5 months ago

cogumbreiro commented 5 months ago

Hi Peter,

I just found your post on HN. Congratulations on the post!

I am one of the developers behind Faial which is a tool that can analyze CUDA kernels and find data-races.

I ran our tool against the kernel flash.cu and found that it is data-race free as long as the following conditions are met:

Faial is a research project, so I am wondering if having access to these correctness conditions is valuable to you as a developer.

Please let me know if you'd like me to try out any combinations of parameters to see if the kernel is still data-race free.

cogumbreiro commented 5 months ago

Hi again,

I was able to show that the kernel is data-race free for all possible uses of that kernel.

In my initial experiments, I missed this crucial piece of code:

const int Bc = 32; const int Br = 32;
// ...
const int Tc = ceil((float) N / Bc); const int Tr = ceil((float) N / Br);

I'll close the issue.