Closed szarnyasg closed 1 year ago
Still crashes but now it is killed and returns an OOM:
17:58 [INFO ] Execute graph loader with command-line: [/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh --graph-nam
e graph500-29 --input-vertex-path /data/gx/graphs/graph500-29.v --input-edge-path /data/gx/graphs/graph500-29.e --output-path ./intermediate/graph500-29 --directed false --weighted false]
/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh: line 66: 151613 Killed bin/sh/relabel.py --use-disk --graph-name ${GRAPH_NAME} --input-vertex ${INPUT_VERTEX_PATH} --input-edge ${INPUT_EDGE_PATH} --output-path ${OUTPUT_PATH} --weighted ${WEIGHTED} --directed ${DIRECTED}
18:51 [ERROR] Failed to load graph ""graph500-29:graph500-29"".
science.atlarge.graphalytics.execution.PlatformExecutionException: Failed to load a GraphBLAS dataset.
at science.atlarge.graphalytics.graphblas.GraphblasPlatform.loadGraph(GraphblasPlatform.java:54) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at science.atlarge.graphalytics.execution.BenchmarkExecutor.loadGraph(BenchmarkExecutor.java:375) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at science.atlarge.graphalytics.execution.BenchmarkExecutor.execute(BenchmarkExecutor.java:132) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at science.atlarge.graphalytics.BenchmarkSuite.main(BenchmarkSuite.java:105) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 137 (Exit value: 137)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at science.atlarge.graphalytics.graphblas.GraphblasLoader.load(GraphblasLoader.java:64) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
at science.atlarge.graphalytics.graphblas.GraphblasPlatform.loadGraph(GraphblasPlatform.java:49) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
... 3 more
Also crashed on the datagen-sf10k-fb
graph:
09:11 [INFO ] ----------------- Loading graph ""datagen-sf10k-fb:datagen-sf10k-fb"" -----------------
09:11 [INFO ] Loading graph datagen-sf10k-fb
09:11 [INFO ] Execute graph loader with command-line: [/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh --graph-nam
e datagen-sf10k-fb --input-vertex-path /data/gx/graphs/datagen-sf10k-fb.v --input-edge-path /data/gx/graphs/cache/datagen-sf10k-fb.e --output-path ./intermediate/datagen-sf10k-fb --direct
ed false --weighted false]
Loading...
Relabelling...
Serializing vertex mapping...
Traceback (most recent call last):
File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 104, in <module>
main()
File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 98, in main
relabel(con, \
File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 51, in relabel
con.execute(f"""
duckdb.OutOfMemoryException: Out of Memory Error: Failed to allocate block of 7843840 bytes
09:20 [ERROR] Failed to load graph ""datagen-sf10k-fb:datagen-sf10k-fb"".
science.atlarge.graphalytics.execution.PlatformExecutionException: Failed to load a GraphBLAS dataset.
The relabelling script crashes on the ORDER BY
clause. This is potentially avoidable, see the specification of the Matrix Market format:
Note that there is no implied order for the matrix elements. This allows one to write simple print routines which traverse the sparse matrix in whatever natural order given by the particular storage scheme.
Removing the ORDER BY
helped. The script now ran on a 4TB memory machine if no other "heavy" processes were running. Curiously, something like a concurrently running multi-thread zstd
run could trip up the relabelling script: it again failed with Out of Memory Error: Failed to allocate block of 262144 bytes
.
On a machine with 3.75TB RAM 🥲 (EC2
x1e.32xlarge
).