neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
769 stars 195 forks source link

Huge Louvain Preparation: more Huge Arrays #859

Closed knutwalker closed 5 years ago

knutwalker commented 5 years ago

In addition to the existing HugeLongArray as replacement for long[]:

jexp commented 5 years ago

Can you comment/note somewhere on your perf-test of the Neo4j Batch-Importer Arrays.

knutwalker commented 5 years ago

@jexp I addressed most of your notes, I just missed the static methods from the Cursor/init methods.

Regarding benchmarks, I ran this PR on my laptop and got the following results:

 mvn -DskipTests -PBenchmark -pl benchmark package
java -jar benchmark/target/benchmark.jar -jvmArgs "-server -Xmx4g -Xms4g -XX:+UseG1GC" -psize=10000000 -psparseness=0.0 -pdistribution=uniform -gc true -rf json -rff long-arrays.json org.neo4j.graphalgo.utils.LongArrayBenchmark

long-arrays.json.zip

long-arrays

the tested arrays are:

primitve: plain Java long[] huge_paged: org.neo4j.graphalgo.core.utils.paged.HugeLongArray with the paging configuration (nested long[][]) huge_paged_cursor: using the HugeCursor instead of get/set huge_single: org.neo4j.graphalgo.core.utils.paged.HugeLongArray with a single page configuration (wrapper around long[]) huge_single_cursor: using the HugeCursor instead of get/set sparse: org.neo4j.graphalgo.core.utils.paged.SparseLongArray, similar to huge_paged, but pages can be null and r/w access includes appropriate checks offHeap: org.neo4j.unsafe.impl.batchimport.cache.OffHeapLongArray chunked: org.neo4j.unsafe.impl.batchimport.cache.DynamicLongArray

some possible observations:

results might be something different on other machines with different CPUs, of course.