In the original code, the TTFT was inaccurate because it was returned only after the prefill was scheduled but before the generation result is returned due to the speculative scheduling. This change ensures the timer waits until the prefill result is returned.
Before submitting
[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
In the original code, the TTFT was inaccurate because it was returned only after the prefill was scheduled but before the generation result is returned due to the speculative scheduling. This change ensures the timer waits until the prefill result is returned.
Before submitting