issues
search
pentium3
/
sys_reading
system paper reading notes
235
stars
12
forks
source link
High-throughput Generative Inference of Large Language Models with a Single GPU
#251
Open
pentium3
opened
1 year ago
pentium3
commented
1 year ago
https://arxiv.org/pdf/2303.06865.pdf
https://arxiv.org/pdf/2303.06865.pdf