shaltielshmid / TorchSharp.FlashAttention

C# Bindings for Flash Attention: Fast and memory-efficient exact attention
2 stars 2 forks source link

Provide a strong-named version? #1

Open LittleLittleCloud opened 1 week ago

LittleLittleCloud commented 1 week ago

Hey, I'd like to use this package to add support for flash attention in mlnet GenAI library, but mlnet is strong-named, so it emits a warning when its dependency is not strong named

shaltielshmid commented 1 week ago

Done. Uploaded to NuGet, version 0.2.2 should be indexed in the next few hours, please confirm that it works!

LittleLittleCloud commented 1 week ago

@shaltielshmid Thanks! Will get back to this thread once I confirm that the new package is working

BTW another question: how much improvement should I expect when using flash attention versus vanilla attention. My observation is the two implementation has very close performance when inferencing and when the batch size and seq length is small.

shaltielshmid commented 1 week ago

If I understood correctly, the key to Flash Attention is optimizing the data transfer between VRAM and SRAM. Therefore, with smaller models/batch size/sequence lengths the impact will be less noticeable, but as you increase the required compute, the difference becomes a lot more noticeable.