voltrondata-labs / arrow-benchmarks-ci

Benchmarks CI for Apache Arrow project
MIT License
0 stars 5 forks source link

R TPCH-15 benchmark failure, mismatch around phone, supplier, ... #135

Open jgehrcke opened 1 year ago

jgehrcke commented 1 year ago

This build failed: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/builds/2383#0188ab58-1ae3-4d9e-9d84-1340f0e5ebee

[230611-17:24:35.418] [29564] [buildkite.benchmark.run] INFO: {'type': 'BenchmarkGroupExecution', 'id': '6c26e1900c46464ca4bec974efe0839b', 'lang': 'R', 'name': 'tpch', 'options': '--iterations=3 --all=true --drop-caches=true', 'flags': {'language': 'R'}, 'benchmarkable_id': 'fe48403651e51e241802fef05984efc37f10a4fc', 'run_id': '2b9b3d7fe9434d22a3233146961de2c7', 'run_name': 'commit: fe48403651e51e241802fef05984efc37f10a4fc', 'machine': 'ec2-m5-4xlarge-us-east-2', 'process_pid': 29564, 'command': 'conbench tpch --iterations=3 --all=true --drop-caches=true --run-id=$RUN_ID --run-name="$RUN_NAME" --run-reason="$RUN_REASON"', 'started_at': '2023-06-11 17:24:35.418056', 'finished_at': None, 'total_run_time': None, 'failed': False, 'return_code': None, 'stderr': None, 'total_machine_virtual_memory': 66711855104}
[230611-17:24:35.498] [29564] [buildkite.benchmark.run] INFO: start child process: conbench tpch --iterations=3 --all=true --drop-caches=true --run-id=$RUN_ID --run-name="$RUN_NAME" --run-reason="$RUN_REASON"
[230611-19:50:58.709] [29564] [buildkite.benchmark.run] INFO: child process exited with code 0.
stderr:

...

[230611-19:00:37.098] [17687] [root] ERROR: {"timestamp": "2023-06-11T19:00:37.098139+00:00", "tags": {"cpu_count": null, "engine": "arrow", "memory_map": false, "query_id": "TPCH-15", "scale_factor": 10, "format": "parquet", "language": "R", "name": "tpch"}, "info": {"arrow_version": "13.0.0-SNAPSHOT", "arrow_compiler_id": "GNU", "arrow_compiler_version": "11.4.0", "benchmark_language_version": "R version 4.2.3 (2023-03-15)", "arrow_version_r": "12.0.0.9000"}, "context": {"arrow_compiler_flags": "-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /var/lib/buildkite-agent/.conda/envs/arrow-commit/include -fdiagnostics-color=always", "benchmark_language": "R"}, "error": "Garbage collection 19 = 13+2+4 (level 0) ...  82.6 Mbytes of cons cells used (56%) 18.8 Mbytes of vectors used (29%)Error in eval(bm$after_each, envir = ctx) : The answer does not match\nCalls: run_bm -> run_iteration -> eval -> eval\nIn addition: Warning message:\nIn eval(bm$after_each, envir = ctx) : \n`attr(old, 'row.names')`:  \n`attr(new, 'row.names')`: 1\n\n`old$s_suppkey`:      \n`new$s_suppkey`: 69998\n\n`old$s_name`:                     \n`new$s_name`: \"Supplier#000069998\"\n\n`old$s_address`:                                           \n`new$s_address`: \"117W54YtKASNfzkZRS8P32856BMj0HKf895vdukw\"\n\n`old$s_phone`:                  \n`new$s_phone`: \"16-386-278-9829\"\n\n`old$total_revenue`:             \n`new$total_revenue`: 2194132.8166\n\nExecution halted", "command": "library(arrowbench); run_one(tpc_h, cpu_count=NULL, format='parquet', scale_factor=10, engine='arrow', memory_map=FALSE, query_id=15)"}
Traceback (most recent call last):
  File "/var/lib/buildkite-agent/builds/aws-ec2-m5-4xlarge-us-east-2-i-0d24cff8def4885c6-1/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/benchmarks/benchmarks/_benchmark.py", line 285, in r_benchmark
    result, output = self._get_benchmark_result(command)
  File "/var/lib/buildkite-agent/builds/aws-ec2-m5-4xlarge-us-east-2-i-0d24cff8def4885c6-1/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/benchmarks/benchmarks/_benchmark.py", line 320, in _get_benchmark_result
    raise Exception(error)
Exception: Garbage collection 19 = 13+2+4 (level 0) ...  82.6 Mbytes of cons cells used (56%) 18.8 Mbytes of vectors used (29%)Error in eval(bm$after_each, envir = ctx) : The answer does not match
Calls: run_bm -> run_iteration -> eval -> eval
In addition: Warning message:
In eval(bm$after_each, envir = ctx) :
`attr(old, 'row.names')`:
`attr(new, 'row.names')`: 1

`old$s_suppkey`:
`new$s_suppkey`: 69998

`old$s_name`:
`new$s_name`: "Supplier#000069998"

`old$s_address`:
`new$s_address`: "117W54YtKASNfzkZRS8P32856BMj0HKf895vdukw"

`old$s_phone`:
`new$s_phone`: "16-386-278-9829"

`old$total_revenue`:
`new$total_revenue`: 2194132.8166

Execution halted
jonkeane commented 1 year ago

This is a different symptom from #131 and (with more details) https://github.com/apache/arrow/issues/35917

The query that is failing here (15), was not one of the ones that had an answer that needed to be changed (see the arrow ticket for more details).

Further, this symptom is showing that the result that was returned in this case (the one marked as old in the output) had zero rows compared to the expected 1 row in the answer (the one marked as new in the output). This is quite strange and indicative of some other problem in the processing (though it is possible this is data-generation related, that a single query fails in this way makes me suspect something else happened in the processing).

Another oddity is that this happened once this weekend and we've seen clean builds (including with this query!) before and after.

Since this seems to be passing now, I'm inclined to wait and see if this pops up again on this particular machine before digging too much further. If it does we should definitely investigate what's up with this machine — and we'll have more evidence then too (is it the same query + scalefactor? or is it a different one? etc)