Open LittleLittleCloud opened 1 week ago
Done. Uploaded to NuGet, version 0.2.2 should be indexed in the next few hours, please confirm that it works!
@shaltielshmid Thanks! Will get back to this thread once I confirm that the new package is working
BTW another question: how much improvement should I expect when using flash attention versus vanilla attention. My observation is the two implementation has very close performance when inferencing and when the batch size and seq length is small.
If I understood correctly, the key to Flash Attention is optimizing the data transfer between VRAM and SRAM. Therefore, with smaller models/batch size/sequence lengths the impact will be less noticeable, but as you increase the required compute, the difference becomes a lot more noticeable.
Hey, I'd like to use this package to add support for flash attention in mlnet GenAI library, but mlnet is strong-named, so it emits a warning when its dependency is not strong named