Open cornzz opened 1 month ago
Thank you for raising the questions. There is point to point response:
Thank you very much! 🙂
@pzs19 @iofu728 sorry, a follow up question: which LLM was used for compression in the end-to-end latency benchmark of the original LLMLingua paper? Under "Implementation Details" it says
In our experiments, we utilize either Alpaca-7B4 or GPT2-Alpaca as the small pre-trained language model M𝑠 for compression.
however, as far as I can see, it is not specified which of those two models was used for the end-to-end latency benchmark. Actually it is not specified which compressor was used for the other benchmarks (gsm8k etc.) either, so that would be another question.
Describe the issue
@pzs19
I would like to reproduce and expand the end2end latency benchmark results of the LLMLingua-2 paper and was therefore wondering if you could provide more details on your experiment setup? Specifically:
max_token
set to, and did you enforce the generation of a minimum number of tokens?Thanks a lot!