huggingface / tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi
http://hf.co/docs/text-generation-inference
Apache License 2.0
27 stars 47 forks source link

Make prefill time of static benchmark correct #214

Closed schoi-habana closed 3 months ago

schoi-habana commented 3 months ago

In the original code, the TTFT was inaccurate because it was returned only after the prefill was scheduled but before the generation result is returned due to the speculative scheduling. This change ensures the timer waits until the prefill result is returned.

Before submitting