pentium3 / sys_reading

system paper reading notes
229 stars 12 forks source link

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU #344

Open pentium3 opened 4 months ago

pentium3 commented 4 months ago

https://arxiv.org/pdf/2303.06865.pdf

pentium3 commented 3 months ago

https://proceedings.mlr.press/v202/sheng23a.html