Three times since Saturday we've seen the ursa-i9-9960x benchmark job be canceled after the 6-hour timeout:
The Buildkite logs all end with
INFO:buildkite.benchmark.run:start child process: conbench tpch --iterations=3 --all=true --drop-caches=true --run-id=$RUN_ID --run-name="$RUN_NAME" --run-reason="$RUN_REASON"
# Received cancellation signal, interrupting
Terminated
🚨 Error: The command was interrupted by a signal
2024-01-08 13:32:01 DEBUG Terminating bootstrap after cancellation with terminated
As far as I can tell, the job runs fine before this point.
Here's a recent good build and its Conbench run, vs a bad build and its Conbench run. I can see in the log timestamps that both builds took 2 hours to reach the start child process: conbench tpch line. And both Conbench runs have results for TPC-H queries 1-20 but the bad build is missing results for queries 21 and 22. (This is the same for the other bad builds). So my guess is that something starts to hang in query 21. It's hard for me to tell as we don't get logs and this doesn't happen every time.
Three times since Saturday we've seen the
ursa-i9-9960x
benchmark job be canceled after the 6-hour timeout:The Buildkite logs all end with
As far as I can tell, the job runs fine before this point.
Here's a recent good build and its Conbench run, vs a bad build and its Conbench run. I can see in the log timestamps that both builds took 2 hours to reach the
start child process: conbench tpch
line. And both Conbench runs have results for TPC-H queries 1-20 but the bad build is missing results for queries 21 and 22. (This is the same for the other bad builds). So my guess is that something starts to hang in query 21. It's hard for me to tell as we don't get logs and this doesn't happen every time.For posterity, here's the code for the TPC-H benchmark.