tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)
Apache License 2.0
548 stars 48 forks source link

expect implementation of flash attention-v2 and flash-decoding #6

Open wisdom-miao opened 2 months ago