A100的自回归tokens/s仅为40.24是否太慢了？

hemingkx / Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

https://sites.google.com/view/spec-bench

Apache License 2.0

166 stars 16 forks source link

A100的自回归tokens/s仅为40.24是否太慢了？ #4

Closed Zwc2003 closed 6 months ago

Zwc2003 commented 6 months ago

我们在4090和L40上进行了测试，自回归的tokens/s均达到了50以上，感觉不太合理。是否可能是cuda的版本不同造成的呢？

hemingkx commented 6 months ago

Thx for your inquiry! Versions of CUDA and PyTorch significantly influence the decoding speed.

We recommend focusing on the '#mean accepted tokens' as a more reliable metric for cross-device and cross-environment comparison for Speculative Decoding. Tokens/s and Speedup are reference metrics for comparing different methods on the same device and test environment.

hemingkx commented 6 months ago

This issue was closed because it has been inactive for 7 days. If there are any other questions, please open a new issue or send me an email.