High-throughput Generative Inference of Large Language Models with a Single GPU - Githubissues

pentium3 / sys_reading

system paper reading notes

235 stars 12 forks source link

High-throughput Generative Inference of Large Language Models with a Single GPU #251

Open pentium3 opened 1 year ago

pentium3 commented 1 year ago

https://arxiv.org/pdf/2303.06865.pdf