Reverts the summary metrics logic in src/guidellm/core/report.py and src/guidellm/core/result.py that was landed due to failing tests. Additionally, test cases are expanded to ensure full coverage of these changes and to stabilize the nightly and main CI pipelines.
Details:
Replaced direct token statistics (prompt_token, output_token) with distribution-based calculations (prompt_token_distribution, output_token_distribution).
Modified percentile handling for request latency, time-to-first-token (TTFT), and inter-token latency (ITL) to improve performance summary accuracy.
Removed computed_field annotations for several properties in src/guidellm/core/result.py.
Updated tests in tests/unit/core/test_report.py from @pytest.mark.regression to @pytest.mark.sanity to better align with the testing standards.
Test Plan:
Unit tests have been added/updated to verify:
Correctness of the refactored token statistics and distribution calculations.
Accurate summary report generation for benchmarks.
Full compatibility with existing functionality.
Verified passing CI/CD pipeline, ensuring no regressions.
Summary:
Reverts the summary metrics logic in
src/guidellm/core/report.py
andsrc/guidellm/core/result.py
that was landed due to failing tests. Additionally, test cases are expanded to ensure full coverage of these changes and to stabilize the nightly and main CI pipelines.Details:
prompt_token
,output_token
) with distribution-based calculations (prompt_token_distribution
,output_token_distribution
).computed_field
annotations for several properties insrc/guidellm/core/result.py
.tests/unit/core/test_report.py
from@pytest.mark.regression
to@pytest.mark.sanity
to better align with the testing standards.Test Plan: