hemingkx / Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
https://sites.google.com/view/spec-bench
Apache License 2.0
166 stars 16 forks source link

How to think a comparison is fair #5

Closed reflectionie closed 6 months ago

reflectionie commented 6 months ago

Thanks for your work, I would like to ask, how do you think the comparison results shown by spec-bench are fair? For example, REST can control the size of the datastore that needs to be maintained; lookahead needs to control the length of N-grams and the size of the pool; how do you think the results provided by spec-bench are fair? I'm not quite sure, it would be greatly appreciated if you could provide further explanation.

hemingkx commented 6 months ago

Thank you for your inquiry! In our initial efforts, we focused on benchmarking the speed of various open-source Speculative Decoding methods under the same GPU hardware and testing environment. We did not perform additional work to search for the optimal parameters for each specific method; instead, we used the default settings recommended in their respective repositories.

The Spec-Bench platform is designed to avoid speedup variance introduced by differing GPU hardware and software environments (torch & cuda version, etc). Regarding the specific hyper-parameters you mentioned, we believe the best way is to use the optimal hyper-parameters of each method to compare their performance. However, their optimal hyper-parameters may vary with different devices (as Lookahead mentioned). We encourage users to explore and determine the most suitable parameters for their specific setup (the default parameters work well in most scenarios).