Reference implementations of the LDBC Social Network Benchmark's Interactive workload (paper, specification on GitHub pages, specification on arXiv).
To get started with the LDBC SNB benchmarks, check out our introductory presentation: The LDBC Social Network Benchmark (PDF).
:warning: Please keep in mind the following when using this repository.
The goal of the implementations in this repository is to serve as reference implementations which other implementations can cross-validated against. Therefore, our primary objective was readability and not absolute performance when formulating the queries.
The default workload contains updates which are persisted in the database. Therefore, the database needs to be reloaded or restored from backup before each run. Use the provided scripts/backup-database.sh
and scripts/restore-database.sh
scripts to achieve this.
We expect most systems-under-test to use multi-threaded execution for their benchmark runs. To allow running the updates on multiple threads, the update stream files need to be partitioned accordingly by the generator. We have pre-generated these for frequent partition numbers (1, 2, ..., 1024 and 24, 48, 96, ..., 768) and scale factors up to 1000.
We provide three reference implementations:
Additional implementations:
For detailed instructions, consult the READMEs of the projects.
To build a subset of the projects, use Maven profiles, e.g. to build the reference implementations, run:
mvn clean package -DskipTests -Pcypher,postgres
This project uses Java 11.
To build the project, run:
scripts/build.sh
The benchmark framework relies on the following inputs produced by the SNB Datagen:
social_network/{static,dynamic}
)social_network/updateStream_*.csv
)substitution_parameters/
)For each implementation, it is possible to perform to perform the run in one of the SNB driver's three modes: create validation parameters, validate, and benchmark. The execution in all three modes should be started after the initial data set was loaded into the system under test.
Create validation parameters with the driver/create-validation-parameters.sh
script.
ldbc.snb.interactive.parameters_dir
configuration property.updateStream_0_0_{forum,person}.csv
files from the location set in the ldbc.snb.interactive.updates_dir
configuration property.1
value to ensure the best average test coverage.validation_params.csv
) file set in the create_validation_parameters
configuration property.Validate against an existing reference output (called "validation parameters") with the driver/validate.sh
script.
validation_params.csv
) file set in the validate_database
configuration property.validation_params-failed-expected.json
and validation_params-failed-actual.json
files.Run the benchmark with the driver/benchmark.sh
script.
ldbc.snb.interactive.parameters_dir
configuration property.updateStream_*_{forum,person}.csv
files from the location set in the ldbc.snb.interactive.updates_dir
configuration property.
updateStream_*_forum.csv
and n updateStream_*_person.csv
files.ldbc.snb.datagen.serializer.numUpdatePartitions
to n in the data generator to get produce these.time_compression_ratio
value while ensuring that the 95% on-time requirement is kept (i.e. 95% of the queries can be started within 1 second of their scheduled time). If your benchmark run returns "failed schedule audit", increase this number (which lowers the time compression rate) until it passes.thread_count
property to the size of the thread pool for read operations.warmup
and operation_count
properties are set so that the warmup and benchmark phases last for 30+ minutes and 2+ hours, respectively.results/
directory.For more details on validating and benchmarking, visit the driver's documentation.
To create a new implementation, it is recommended to use one of the existing ones: the Neo4j implementation for graph database management systems and the PostgreSQL implementation for RDBMSs.
The implementation process looks roughly as follows:
To generate the benchmark data sets, use the Hadoop-based LDBC SNB Datagen.
The key configurations are the following:
ldbc.snb.datagen.generator.scaleFactor
: set this to snb.interactive.${SCALE_FACTOR}
where ${SCALE_FACTOR}
is the desired scale factorldbc.snb.datagen.serializer.numUpdatePartitions
: set this to the number of write threads used in the benchmark runsCsvMergeForeign
or CsvComposite
ldbc.snb.datagen.serializer.dynamicActivitySerializer
ldbc.snb.datagen.serializer.dynamicPersonSerializer
ldbc.snb.datagen.serializer.staticSerializer
Producing large-scale data sets requires non-trivial amounts of memory and computing resources (e.g. SF100 requires 24GB memory and takes about 4 hours to generate on a single machine). To mitigate this, we have pregenerated data sets using 9 different serializers and the update streams using 17 different partition numbers:
The data sets are available at the SURF/CWI data repository. We also provide direct links and a download script (which stages the data sets from tape storage if they are not immediately available).
We provide validation parameters for SF0.1 to SF10. These were produced using the Neo4j reference implementation.
Small test data sets are placed in the cypher/test-data/
directory for Neo4j and in the postgres/test-data/
for the SQL systems.
To generate a data set with the same characteristics, see the documentation on generating the test data set.
Implementations of the Interactive workload can be audited by a certified LDBC auditor. The Auditing Policies chapter of the specification describes the auditing process and the required artifacts. If you are considering commissioning an LDBC SNB audit, please study the auditing process document and the audit questionnaire.
driver/benchmark.properties
file as described in the Driver modes section.scripts/load-in-one-step.sh
.scripts/backup-database.sh
.driver/determine-best-tcr.sh
.We have a few recommendations for creating audited implementations. (These are not requirements – implementations are allowed to deviate from these recommendations.)
r5d.12xlarge
). Both bare-metal and regular instances can be used for audited runs.