microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention
MIT License
248 stars 16 forks source link

Add microbenchmark to profile kernel latency with different page sizes #3

Closed apanwariisc closed 4 months ago