pentium3 / sys_reading

system paper reading notes
234 stars 12 forks source link

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving #366

Open pentium3 opened 6 months ago

pentium3 commented 6 months ago

https://arxiv.org/pdf/2401.09670v1.pdf