pentium3 / sys_reading

system paper reading notes
234 stars 12 forks source link

LLM in a flash: Efficient Large Language Model Inference with Limited Memory #314

Open pentium3 opened 9 months ago

pentium3 commented 9 months ago

https://arxiv.org/pdf/2312.11514.pdf