issues
search
pentium3
/
sys_reading
system paper reading notes
235
stars
12
forks
source link
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
#314
Open
pentium3
opened
11 months ago
pentium3
commented
11 months ago
https://arxiv.org/pdf/2312.11514.pdf
https://arxiv.org/pdf/2312.11514.pdf