microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention
MIT License
219 stars 14 forks source link

Add microbenchmark to profile kernel latency with different page sizes #3

Closed apanwariisc closed 3 months ago