pentium3 / sys_reading

system paper reading notes
235 stars 12 forks source link

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices #358

Open pentium3 opened 8 months ago