ldbc / lsqb

Labelled Subgraph Query Benchmark – A lightweight benchmark suite focusing on subgraph matching queries. Note: This is a microbenchmark for system developers and not an official LDBC benchmark.
Apache License 2.0
27 stars 15 forks source link

Labelled Subgraph Query Benchmark (LSQB)

Build Status

:page_facing_up: LSQB: A Large-Scale Subgraph Query Benchmark, GRADES-NDA'21 paper (presentation)

Overview

A benchmark for subgraph matching but with type information (vertex and edge types). The primary goal of this benchmark is to test the query optimizer (join ordering, choosing between binary and n-ary joins) and the execution engine (join performance, support for worst-case optimal joins) of graph databases. Features found in more mature database systems and query languages such as date/string operations, query composition, complex aggregates/filters are out of scope for this benchmark.

The benchmark consists of the following 9 queries:

Inspirations and references:

Getting started

Install dependencies

  1. Install Docker on your machine.

  2. (Optional) Change the location of Docker's data directory (instructions).

  3. Install the dependencies:

    scripts/install-dependencies.sh
    # optional convenience packages
    scripts/install-convenience-packages.sh
  4. (Optional) Add the Umbra binaries as described in the umb/README.md file.

  5. Test the system using scripts/benchmark.sh, e.g. run all systems through the smallest example data set. This tests whether all dependencies are installed and it also downloads the required Docker images.

Creating the input data

Data sets should be provided in two formats:

An example data set is provided with the substitution SF=example:

Pre-generated data sets are available in the SURF/CWI data repository.

To download the data sets, set the MAX_SF environment variable to the size of the maximum scale factor you want to use (at least 1) and run the download script.

For example:

export MAX_SF=3
scripts/download-projected-fk-data-sets.sh
scripts/download-merged-fk-data-sets.sh

For more information, see the download instructions and links.

Generating the data sets from scratch

See data generation.

Running the benchmark

The following implementations are provided. The :whale: symbol denotes that the implementation uses Docker.

Stable implementations:

Running the benchmark

The benchmark run consists of two key steps: loading the data and running the queries on the database.

Some systems need to be online before loading, while others need to be offline. To handle these differences in a unified way, we use three scripts for loading:

The init-and-load.sh script calls these three scripts (pre-load.sh, load.sh, and post-load.sh). Therefore, to run the benchmark and clean up after execution, use the following three scripts:

Example usage that loads scale factor 0.3 to Neo4j:

cd neo
export SF=0.3
./init-and-load.sh && ./run.sh && ./stop.sh

Example usage that runs multiple scale factors on DuckDB. Note that the SF environment variable needs to be exported.

cd ddb
export SF
for SF in 0.1 0.3 1; do
   ./init-and-load.sh && ./run.sh && ./stop.sh
done

Validation of results

Use the validate.sh script. For example:

scripts/validate.sh --system DuckDB-1.0.0 --variant "10 threads" --scale_factor example
scripts/validate.sh --system Neo4j-5.20.0 --scale_factor 0.1
scripts/validate.sh --system PostgreSQL --scale_factor example

Philosophy