Closed austin3dickey closed 11 months ago
The only machine that runs this is ursa-i9-9960x
. Here is a build link.
231022-13:22:15.225 INFO: Initializing adapter
231022-13:22:15.255 INFO: source nyctaxi_multi_parquet_s3: download, if required
231022-13:22:15.263 INFO: constructed Dataset object for source in 0.0066 s
231022-13:22:15.263 INFO: case ('1pc', 'parquet'): create directory
231022-13:22:15.263 INFO: directory created, path: /dev/shm/bench-cd80377a/1pc-parquet-eed70f79-5ee5-4156-b932-79bdf3b754d0
231022-13:22:15.263 INFO: read 561000 rows of dataset nyctaxi_multi_parquet_s3 into memory
231022-13:22:15.528 INFO: read source dataset into memory in 0.2645 s
231022-13:22:20.250 INFO: try to perform login
231022-13:22:20.250 INFO: try: POST to https://conbench.ursa.dev/api/login/
231022-13:22:20.536 INFO: POST request to https://conbench.ursa.dev/api/login/: took 0.2858 s, response status code: 204
231022-13:22:20.536 INFO: ConbenchClient: initialized
231022-13:22:20.536 INFO: try: POST to https://conbench.ursa.dev/api/benchmark-results/
231022-13:22:20.607 INFO: POST request to https://conbench.ursa.dev/api/benchmark-results/: took 0.0712 s, response status code: 201
231022-13:22:20.671 INFO: stdout of ['du', '-sh', '/dev/shm/bench-cd80377a/1pc-parquet-eed70f79-5ee5-4156-b932-79bdf3b754d0']: 20M
231022-13:22:20.672 INFO: removing directory: /dev/shm/bench-cd80377a/1pc-parquet-eed70f79-5ee5-4156-b932-79bdf3b754d0
231022-13:22:20.674 INFO: case ('1pc', 'arrow'): create directory
231022-13:22:20.674 INFO: directory created, path: /dev/shm/bench-cd80377a/1pc-arrow-b8bced17-2cee-4bca-83d4-534d23e2f468
231022-13:22:20.674 INFO: read 561000 rows of dataset nyctaxi_multi_parquet_s3 into memory
231022-13:22:20.814 INFO: read source dataset into memory in 0.1400 s
Fatal Python error: Segmentation fault
Interestingly, sometimes one or two cases succeed before the segfault, and sometimes none of them do.
Here's a breakdown of the number of successful dataset-serialize
results per run this month:
run_timestamp | run_id | num_results
----------------------------+----------------------------------+-------------
2023-10-01 20:47:05.27222 | 1653cbab792a4905950da7a357c27aab | 24
2023-10-01 23:41:55.081723 | 1e2c0d5208784aa9a98e28fd387a8c67 | 24
2023-10-02 02:37:15.753681 | 08be4f7cab094940b1c8c31ca9e902c4 | 24
2023-10-02 05:34:42.301889 | 497dc6271cc541abbd2adec51695259d | 24
2023-10-02 15:48:42.227189 | 1077a66e57a74edfbea848d528262f86 | 24
2023-10-03 08:28:15.839898 | 6ca857816880414aa3ab96e2daf0860d | 24
2023-10-03 14:49:27.261457 | c99fb3bbd61d429d9af8510254f6a8e1 | 24
2023-10-03 20:20:28.55793 | a1e3d4d07c28450e88e3eba64f420707 | 24
2023-10-03 23:11:30.643553 | d6530cb0f1cf41b8b1874677ef3ab37a | 24
2023-10-04 11:00:46.131435 | 3564dbe69233453f8970fd6127ca222b | 24
2023-10-04 16:00:31.168945 | 7deb05ad67484f16bd92545e79d91f90 | 24
2023-10-05 08:23:45.134169 | aa5c53940d2942bbad57dad7eafc7e7f | 24
2023-10-05 11:21:23.874677 | 980a34c6cdb7424189e0de6ca2924057 | 24
2023-10-05 14:15:26.585407 | 7131e14847454a49a5bbd2cb428e3e67 | 24
2023-10-05 17:28:01.860131 | 9fe656d750ae4a5c8d8236eed69d7f2b | 24
2023-10-05 20:19:41.701115 | 8a68c0b0b8ce41db8631e8e236af9235 | 24
2023-10-05 23:26:28.99192 | f4d6b6343f7a436ba3894f871a524b0c | 24
2023-10-06 02:22:16.671595 | 8016d5c3fc3e413ca0e8ab7f7bb52e1c | 24
2023-10-06 05:28:59.872214 | 141651da421049f3b1669d7a3f5a4d88 | 24
2023-10-06 08:24:59.470588 | 69103244469d4b01bbe493f39194a3fb | 24
2023-10-06 11:20:45.434535 | c462c49764b5442fa3ac1af671f88b08 | 24
2023-10-06 14:29:45.198447 | a2af089b4fa1421cbed1d59b18694a87 | 24
2023-10-06 17:24:26.104814 | 200947668e5c41a79243aaf31ab42380 | 24
2023-10-06 20:07:02.596901 | 5627438887054b109842daf9287b4332 | 0
2023-10-06 22:50:22.449426 | 0ecb0e4f83d0406593087e08c0b61fb3 | 1
2023-10-07 02:57:22.516069 | f6ccd1c4b5fb41c396811b76eef3902d | 0
2023-10-07 04:32:45.674441 | 36e00707836945e188083a5ebc84f3f6 | 1
2023-10-07 23:33:45.35763 | cb02ebcfc6d54f649db4dbae0f86c442 | 2
2023-10-08 22:09:50.399974 | 58f58379c6c545a0b8d4a82e7d894626 | 1
2023-10-10 01:12:00.395274 | e79336051c534dd0be81d1280d462968 | 1
2023-10-10 04:07:00.964479 | 6bb312c6084840ef97269ae527a759d1 | 0
2023-10-10 06:09:36.389056 | 309cc226fc3d4322baaf69faff63c956 | 1
2023-10-10 08:55:24.822424 | a91a6a2060c545f8b13ad0fe2e082475 | 0
2023-10-10 11:04:34.424189 | b0b0939418b14986a79b32547489c6a1 | 0
2023-10-10 13:31:12.50375 | f16770c6bc95419ba5011d10ea3a974d | 0
2023-10-10 16:07:28.651402 | 870f5c2b5b7c427fbc8a07c8708c2434 | 1
2023-10-10 18:33:16.425559 | 7fe710688b4d47f6a3932bfab9c599f7 | 1
2023-10-10 23:12:45.442182 | adf798f0306b44ca902b24e95182072b | 0
2023-10-11 01:06:54.330537 | e053efe4fccb48d8923ca49f22aa3928 | 0
2023-10-11 02:51:32.430454 | c985261e64e441fbb2d8da74ee0a271e | 1
2023-10-11 05:23:57.397817 | 3cedb97c4d0e4ca48c392562d7822c71 | 0
2023-10-11 07:53:29.72254 | 517c2fb5d61645c0937a4a80e4c02ab1 | 1
2023-10-11 10:22:12.897347 | 0a40322ac5ad4cc69b1e2da555cc9884 | 1
2023-10-11 13:45:02.667613 | 05f2a40424984d62808b6600a7be60e8 | 0
2023-10-11 15:23:16.871397 | 2dcd6824d73c4e238bbc36a77578eafc | 2
2023-10-11 17:51:40.61575 | 94b8fa73017b4526bb66c179a67c97f5 | 1
2023-10-11 20:21:02.843915 | 4a441cef677240cfbc593b858d5e6fc5 | 1
2023-10-11 22:52:02.991529 | da5bc2af4a9e4e849115de5f872e1a43 | 2
2023-10-12 01:23:48.596752 | 4912265bd177431e9ccab6a865979bc2 | 0
2023-10-12 04:56:52.251901 | db4cf82bc3e1491ba1500984e727eb02 | 0
2023-10-12 06:17:58.084406 | 2fd42faf48404478affd58f9bcfbb26e | 1
2023-10-12 08:52:12.650021 | bb99d7aba6ce4bd79c1b9bbf0ace732f | 2
2023-10-12 11:23:30.072888 | 8538c42b61e24b8f897eecba8ab51084 | 2
2023-10-12 14:35:46.545025 | 461fd9038239478199c4ce554783b107 | 0
2023-10-12 17:12:39.307677 | e09f15cd85d8455f80562dbb91f37389 | 0
2023-10-12 18:51:16.012983 | 2f6706c923c144ae89eddf11447734bc | 0
2023-10-12 21:24:05.360257 | 4b879813b1fd466cac3bd9a42b5f0eff | 0
2023-10-12 23:52:24.865696 | 810555452bb7453eb3637d74cdac4f05 | 1
2023-10-13 02:24:02.374777 | 6b2f3f65be194fa8aac8c854d4491958 | 0
2023-10-13 04:53:12.668694 | e9c7b12f180944fd9339d1acc89f27dc | 1
2023-10-13 07:19:54.560573 | 85c7fdd868d04e6aa61c898cf5a9f3a5 | 1
2023-10-13 10:33:45.433656 | d8e1f73c854e4d2fa7dcc67f9b28dc53 | 0
2023-10-13 13:22:57.123444 | e6c83373db21434f940879904d46fbfc | 0
2023-10-13 15:01:25.231645 | ab8fb74aed7741aba3d2ea6633c060d2 | 1
2023-10-13 18:30:03.608629 | 0122c1ead02d439f97e113243f87d337 | 0
2023-10-13 21:46:14.986946 | 7d324417a2f14070bb6a05ce184ae250 | 0
2023-10-14 00:40:17.52854 | 5f5c93257e234f21a40a6d0f875a9cc9 | 0
2023-10-14 02:10:36.812122 | 8b67a831fb3947c599bc638ba6b2269a | 2
2023-10-16 10:04:47.249041 | 5635a683e0b94470bcf0b3ce35ec1f9b | 1
2023-10-16 12:52:22.633604 | 6b2c2f77b61a4a39a6b33f834edd6b5d | 1
2023-10-16 17:22:58.089709 | f4fc993a105944aeabe04a56d6fc3a9e | 0
2023-10-16 18:09:30.601348 | 75f834400473428a87ff70013ccc684f | 0
2023-10-17 01:42:14.372681 | c5d83122840d4afc827dac61c8f7df22 | 1
2023-10-17 05:01:10.351914 | f3a18df396fa485ba6cf49231fae70fd | 0
2023-10-17 08:44:35.353271 | fa505501b1284dcd9ce7a347decbff54 | 0
2023-10-17 16:31:06.679952 | 56a088bf01764184bf34d59c108ee4e0 | 1
2023-10-17 20:40:37.481827 | fc2b2170fda04f568c7e5c9a47e7de95 | 0
2023-10-18 09:35:43.28765 | b57227ca04d64bcaa63b3a311b6f6743 | 0
2023-10-18 11:12:48.319844 | 7f6271767bf741bf91ea38ae207b7cea | 0
2023-10-18 14:57:54.460282 | 0ff935e33bc04e8e9d833f99187b8a72 | 0
2023-10-18 16:10:27.399072 | d379f727daef47d0b14f7647e3ef089b | 1
2023-10-19 02:44:17.844343 | 03302b3b966049a6ace506ccdb307395 | 1
2023-10-19 11:32:19.094086 | 8660f5699ed84c58a5b316432271d9a0 | 0
2023-10-19 13:53:01.875819 | 04f8720de29146db9a50344477afe4cc | 1
2023-10-19 17:35:08.017099 | 6b5cc6e3f4ac4b45bca66184ee8487ca | 1
2023-10-19 20:03:41.026884 | aeca00a62f38400baa34aad1782ffd5d | 3
2023-10-19 22:33:13.40542 | ea9f57e2e9444fc8ab74ca163a2ef4d6 | 1
2023-10-20 09:51:30.247996 | b97ffbffbe28497d9cda75b124bfb704 | 1
2023-10-22 18:22:19.403434 | 779a94ec29b649c29c5e4d1be968d6e0 | 1
2023-10-23 15:02:29.371518 | 9acfb5dd28cc48bcb5bcf57d6ba0cdf7 | 0
The first run without 24 results was this one: https://conbench.ursa.dev/runs/5627438887054b109842daf9287b4332/
On commit https://github.com/apache/arrow/commit/d7017dd0dc567969c79d14aefc3d5a638e66270a, which has the message GH-36765: [Python][Dataset] Change default of pre_buffer to True for reading Parquet files (#37854)
. Interesting!
@jorisvandenbossche It looks like the dataset-serialize
benchmark started segfaulting after https://github.com/apache/arrow/pull/37854 was merged. Do you think we'll need to make changes to how the benchmark is run or is there something that needs to be fixed on the Arrow side?
We could probably just set pre_buffer=False
. Or someone could research whether there's a way to consistently avoid the segfault (which I'm assuming is memory-related? not quite sure) even with pre_buffer=True
.
I think it depends on what the Arrow community wants to actually be measuring here. For instance, it may not make sense to compare the benchmark timings measured with and without pre_buffer
.
We could probably just set
pre_buffer=False
We could do that short-term to get the benchmark working again. The benchmark is actually about writing if I am reading it correctly, and so it segfaults in the setup, thus changing this won't impact the actual benchmark.
(although it is a bit strange that it still logs the timing info after reading)
But the change that started this (pre_buffer
default change) should not cause a segfault. If that is happening, that's a critical bug, and something we should still try to reproduce outside of the benchmarks.
We did have some crashes on the main Arrow CI as well after merging that PR, but those were fixed with https://github.com/apache/arrow/pull/38073
Okay, I opened https://github.com/apache/arrow/issues/38438. I'll try to see if using pre_buffer=False
fixes the problem.
I was able to avoid the segfault locally by setting pre_buffer=False
in https://github.com/voltrondata-labs/benchmarks/pull/152. Once I merge that, this issue can be closed.
Like you said though, https://github.com/apache/arrow/issues/38438 seems like a critical bug.
That band-aid worked: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/3689#018b64a3-2deb-4824-96ee-3b13c7c67261/6-24113
PASSED Python dataset-serialize 0:27:17.999837
Thanks for opening the issue! Will try to further look into that tomorrow.
I've try to fix it here: https://github.com/apache/arrow/pull/38466
Not sure this really fix the bug, you can have a try here...
More details to come.