FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU - Githubissues

pentium3 / sys_reading

system paper reading notes

235 stars 12 forks source link

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU #344

Open pentium3 opened 8 months ago

pentium3 commented 8 months ago

https://arxiv.org/pdf/2303.06865.pdf

pentium3 commented 8 months ago

https://proceedings.mlr.press/v202/sheng23a.html