Open tdoublep opened 1 week ago
Can you help me understand why this should be prioritized? Is it in the interest of deterministic benchmark?
@cadedaniel Exactly that. With speculative decoding enabled, the seed doesn't just affect the actual output that you get, but can also affect the total number of decoding steps required. Thus, in order to get reproducible benchmarks for temperature>0, we need to be able to fix the seed.
Your current environment
🐛 Describe the bug
Speculative decoding currently does not request the per-request seed. This bug was discovered by @jvlunteren while performing performance evaluation for speculative decoding under different kinds of workloads.
It can be reproduced by starting a server with:
then sending a couple of requests with:
The output will be different each time.