decoding tokens are not equal for different methods

zfjsail commented 1 month ago

Hi, thank you so much for your awesome work!

I notice that when running equal.py to compare decoding tokens of speculative decoding methods (pld/eagle/hydra) with vanilla decoding tokens, the outputs are always Not Equal! I observe the decoding tokens between different methods are almost the same. What are the reasons that cause the slight differences between decoding results?

hemingkx commented 1 month ago

Thank you for your kind words and your question!

We have noticed similar discrepancies in our experiments with multiple Speculative Decoding methods compared to vanilla autoregressive (AR) decoding. Specifically, in our float32 precision and greedy decoding settings, only SpS and PLD produce results that are exactly the same as AR decoding. Other methods show minor differences, especially towards the end of long sequences.

We believe these slight variations may result from small computational errors that accumulate over each decoding step, becoming noticeable in longer sequences.

We plan to investigate this issue further in the coming days. If you have any insights or ideas, please feel free to share them!

hemingkx commented 1 month ago

BTW, we did not modify the coding of specific algorithms, which means that the original implementation also has this issue. You can also communicate with the authors of corresponding methods😊.

hemingkx / Spec-Bench

decoding tokens are not equal for different methods #6