nod-ai / SHARK-TestSuite

Temporary home of a test suite we are evaluating
Apache License 2.0
2 stars 25 forks source link

Missing error handling in benchmark_sdxl_rocm.py #286

Open ScottTodd opened 1 month ago

ScottTodd commented 1 month ago

On https://github.com/iree-org/iree/pull/17847, compilation failed and the benchmark job using iree_tests/benchmarks/sdxl/benchmark_sdxl_rocm.py at https://github.com/nod-ai/SHARK-TestSuite/commit/3603a453b3777fac9af4506a3dc0b3d87587fd47 did not handle that gracefully:

https://github.com/iree-org/iree/actions/runs/9874435266/job/27269408927#step:16:46

INFO     root:benchmark_sdxl_rocm.py:31 Command failed with error: b''
INFO     root:benchmark_sdxl_rocm.py:161 Running SDXL ROCm benchmark failed. Exiting
INFO     root:benchmark_sdxl_rocm.py:179 E2E Benchmark Time: None ms (golden time 320.0 ms)

...

>       check.less_equal(benchmark_e2e_mean_time, goldentime_rocm_e2e, "SDXL e2e benchmark time should not regress")
E       TypeError: '<=' not supported between instances of 'NoneType' and 'float'

SHARK-TestSuite/iree_tests/benchmarks/sdxl/benchmark_sdxl_rocm.py:298: TypeError
ScottTodd commented 1 month ago

Traced this a bit.

We only log stderr on failure here, but we still return stdout: https://github.com/nod-ai/SHARK-TestSuite/blob/3603a453b3777fac9af4506a3dc0b3d87587fd47/iree_tests/benchmarks/sdxl/benchmark_sdxl_rocm.py#L23-L32

We get that stdout output here and pass it to job_summary_process: https://github.com/nod-ai/SHARK-TestSuite/blob/3603a453b3777fac9af4506a3dc0b3d87587fd47/iree_tests/benchmarks/sdxl/benchmark_sdxl_rocm.py#L174-L176

The stdout output is then ignored in job_summary_process if the ret value is 1: https://github.com/nod-ai/SHARK-TestSuite/blob/3603a453b3777fac9af4506a3dc0b3d87587fd47/iree_tests/benchmarks/sdxl/benchmark_sdxl_rocm.py#L159-L167

ScottTodd commented 1 month ago

New coverage with pytest prior to the benchmark script also helps here.

ScottTodd commented 1 month ago

Landed a fix in IREE. Can copy it to this repo as well or just call this fixed.