Open qeternity opened 1 day ago
Thanks for contributing the test case. This is a know problem https://sgl-project.github.io/references/faq.html#the-results-are-not-deterministic-even-with-a-temperature-of-0. If you are interested, please help us add a padded batching mode.
Hi @merrymercy - this is not a determinism bug. You can generate the same text with top_k=1
or with a regex, at much higher concurrency, and it will pass every time. This is an issue that is specific to select
.
I added a regular gen
test at much greater concurrency to illustrate the above. As you can see, the test is still only failing the select
invocation. The way this test is configured is that they should be net equivalent, even with the different behavior of select
(at least I think this is correct). Further, this applies to all choices sampling methods.
I see. I think the real reason is also due to some determinism of the input logprob, because select depends on input logprobs. Can you use regex / normal decoding for your current use cases? We will probably not fix this issue if it is not a regression. We will revisit this later with a more fundamental solution.
Yes, we can. But this line of investigation actually started because we were seeing very flaky JSON generation. And unfortunately, this easily triggers at the level of traffic we serve in prod.
I fully appreciate batching and kernel non-determinism but this feels like there is a deeper issue.
This is further to some discussion in the Slack. Select under moderate concurrency is very unstable.
We discovered this investigating some other issues that we've experienced in recent versions of sglang.
I'm not sure where in the test suite this test is best suited, so happy to move it.