pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
434 stars 22 forks source link

Thank you for awesome work! I saw from the blog that paged attention can also be implemented with flex attention. #17

Closed kme2698 closed 2 months ago