LLM in a flash: Efficient Large Language Model Inference with Limited Memory - Githubissues

pentium3 / sys_reading

system paper reading notes

235 stars 12 forks source link

LLM in a flash: Efficient Large Language Model Inference with Limited Memory #314

Open pentium3 opened 11 months ago

pentium3 commented 11 months ago

https://arxiv.org/pdf/2312.11514.pdf