siyan-zhao / prepacking

The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"
https://arxiv.org/abs/2404.09529
56 stars 2 forks source link