pentium3 / sys_reading

system paper reading notes
235 stars 12 forks source link

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache #316

Open pentium3 opened 10 months ago

pentium3 commented 10 months ago

https://arxiv.org/pdf/2401.02669.pdf