ldbc / ldbc_graphalytics_platforms_graphblas

LDBC Graphalytics implementation using SuiteSparse:GraphBLAS and LAGraph
Apache License 2.0
4 stars 7 forks source link

Relabelling script crashes during serialization for graph500-29+ and datagen-sf10k-fb #26

Closed szarnyasg closed 1 year ago

szarnyasg commented 1 year ago

On a machine with 3.75TB RAM 🥲 (EC2 x1e.32xlarge).

17:27 [INFO ] ----------------- Loading graph ""graph500-29:graph500-29"" -----------------
17:27 [INFO ] Loading graph graph500-29
17:27 [INFO ] Execute graph loader with command-line: [/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh --graph-name graph500-29 --input-vertex-path /data/gx/graphs/graph500-29.v --input-edge-path /data/gx/graphs/graph500-29.e --output-path ./intermediate/graph500-29 --directed false --weighted false]
Loading...
Relabelling...
Serializing vertex mapping...
Traceback (most recent call last):
  File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 104, in <module>
    main()
  File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 98, in main
    relabel(con, \
  File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 51, in relabel
    con.execute(f"""
duckdb.OutOfMemoryException: Out of Memory Error: Failed to allocate block of 262144 bytes
17:34 [ERROR] Failed to load graph ""graph500-29:graph500-29"".
 science.atlarge.graphalytics.execution.PlatformExecutionException: Failed to load a GraphBLAS dataset.
        at science.atlarge.graphalytics.graphblas.GraphblasPlatform.loadGraph(GraphblasPlatform.java:54) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.execution.BenchmarkExecutor.loadGraph(BenchmarkExecutor.java:374) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.execution.BenchmarkExecutor.execute(BenchmarkExecutor.java:132) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.BenchmarkSuite.main(BenchmarkSuite.java:105) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.graphblas.GraphblasLoader.load(GraphblasLoader.java:64) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.graphblas.GraphblasPlatform.loadGraph(GraphblasPlatform.java:49) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        ... 3 more
szarnyasg commented 1 year ago

Still crashes but now it is killed and returns an OOM:

17:58 [INFO ] Execute graph loader with command-line: [/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh --graph-nam
e graph500-29 --input-vertex-path /data/gx/graphs/graph500-29.v --input-edge-path /data/gx/graphs/graph500-29.e --output-path ./intermediate/graph500-29 --directed false --weighted false]
/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh: line 66: 151613 Killed                  bin/sh/relabel.py --use-disk --graph-name ${GRAPH_NAME} --input-vertex ${INPUT_VERTEX_PATH} --input-edge ${INPUT_EDGE_PATH} --output-path ${OUTPUT_PATH} --weighted ${WEIGHTED} --directed ${DIRECTED}
18:51 [ERROR] Failed to load graph ""graph500-29:graph500-29"".
 science.atlarge.graphalytics.execution.PlatformExecutionException: Failed to load a GraphBLAS dataset.
        at science.atlarge.graphalytics.graphblas.GraphblasPlatform.loadGraph(GraphblasPlatform.java:54) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.execution.BenchmarkExecutor.loadGraph(BenchmarkExecutor.java:375) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.execution.BenchmarkExecutor.execute(BenchmarkExecutor.java:132) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.BenchmarkSuite.main(BenchmarkSuite.java:105) [graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 137 (Exit value: 137)
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.graphblas.GraphblasLoader.load(GraphblasLoader.java:64) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        at science.atlarge.graphalytics.graphblas.GraphblasPlatform.loadGraph(GraphblasPlatform.java:49) ~[graphalytics-platforms-graphblas-0.1-SNAPSHOT-default.jar:?]
        ... 3 more
szarnyasg commented 1 year ago

Also crashed on the datagen-sf10k-fb graph:

09:11 [INFO ] ----------------- Loading graph ""datagen-sf10k-fb:datagen-sf10k-fb"" -----------------
09:11 [INFO ] Loading graph datagen-sf10k-fb
09:11 [INFO ] Execute graph loader with command-line: [/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/./bin/sh/load-graph.sh --graph-nam
e datagen-sf10k-fb --input-vertex-path /data/gx/graphs/datagen-sf10k-fb.v --input-edge-path /data/gx/graphs/cache/datagen-sf10k-fb.e --output-path ./intermediate/datagen-sf10k-fb --direct
ed false --weighted false]
Loading...
Relabelling...
Serializing vertex mapping...
Traceback (most recent call last):
  File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 104, in <module>
    main()
  File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 98, in main
    relabel(con, \
  File "/data/gx/ldbc_graphalytics_platforms_graphblas/graphalytics-1.6.0-SNAPSHOT-graphblas-0.1-SNAPSHOT/bin/sh/relabel.py", line 51, in relabel
    con.execute(f"""
duckdb.OutOfMemoryException: Out of Memory Error: Failed to allocate block of 7843840 bytes
09:20 [ERROR] Failed to load graph ""datagen-sf10k-fb:datagen-sf10k-fb"".
 science.atlarge.graphalytics.execution.PlatformExecutionException: Failed to load a GraphBLAS dataset.
szarnyasg commented 1 year ago

The relabelling script crashes on the ORDER BY clause. This is potentially avoidable, see the specification of the Matrix Market format:

Note that there is no implied order for the matrix elements. This allows one to write simple print routines which traverse the sparse matrix in whatever natural order given by the particular storage scheme.

szarnyasg commented 1 year ago

Removing the ORDER BY helped. The script now ran on a 4TB memory machine if no other "heavy" processes were running. Curiously, something like a concurrently running multi-thread zstd run could trip up the relabelling script: it again failed with Out of Memory Error: Failed to allocate block of 262144 bytes.