Open qspang opened 10 months ago
1) codellama 2) codellama-instruct 3) llama-2-chat
btw, may I ask if AVG COMPRESS RATIO uses Lookahead’s acceleration effect?
AVG COMPRESS RATIO shows the #generated tokens/#decoding steps with lookahead decoding. It is the upper bound of the speedups.
I want to know the specific acceleration effect. Which indicator should I look at?
I guess you need to re-run the experiments without lade, like running with USE_LADE=0. Then you can compare the average throughputs shown in the figure above with the throughputs without lade.
but there are two average throughputs:average throughputs1、average throughputs2,Which one should be used for comparison, or should both be used for comparison and then averaged?
I think both are reasonable. Throughtput1 is the sum(throughput for each questions)/#questions. Throughput2 is the #generated tokens/#decoding steps for the whole dataset.
Thank you for your patient reply!!!:)
How do you measure "throughput for each question"?
generated tokens/#time
Hello, can you provide script to evaluate codellama on Human-Eval, like what you do on MT-bench?
generated tokens/#time
Hello, can you provide script to evaluate codellama on Human-Eval, like what you do on MT-bench?
I will upload them in the following one or two weeks.
generated tokens/#time
Hello, can you provide script to evaluate codellama on Human-Eval, like what you do on MT-bench?
I will upload them in the following one or two weeks.
ok, thanks for you amazing work!
Can I ask you what models the following three images correspond to?