pentium3 / sys_reading

system paper reading notes
235 stars 12 forks source link

Efficient Memory Management for Large Language Model Serving with PagedAttention #291

Open pentium3 opened 1 year ago

pentium3 commented 1 year ago

https://arxiv.org/pdf/2309.06180.pdf

pentium3 commented 1 year ago

https://github.com/vllm-project/vllm