pmacko86 / pig-bench

A benchmark suite for performance introspection of graph databases
1 stars 0 forks source link

PIG: Performance Introspection of Graph Databases

The explosion of graph data in social and biological networks, recommendation systems, provenance databases, etc. makes graph storage and processing of paramount importance.

PIG is a new graph benchmarking framework, which provides both a methodology for evaluating graph database performance and a mechanism to carry out such evaluations. It takes a hierarchical approach to benchmarking. The suite has three layers of benchmarks:

This framework allows for comparisons between systems as well as single system introspection. Such introspection allows one to evaluate the degree to which systems exploit their knowledge of graph access patterns. The suite also comes with a web interface that makes it easy to run benchmarks and to visualize and analyze the collected data.

Quick-Start

To run PIG, you will need:

After installing all the prerequisites and checking out the source code of PIG, cd into the graphdb-bench directory and type:

mvn install

You can then start the web interface using:

./runWebInterfaceServer.sh

This will start a server on port 8080. Or you can run the benchmark tools directly from the command-line using:

./runBenchmarkSuite.sh`

Use the --help option to get the list of available commands or +help to see advanced options and options for configuring the JVM.

Configuration

To edit the configuration of PIG, please edit the following file:

graphdb-bench/src/main/resources/com/tinkerpop/bench/bench.properties`

You can also override many options using command-line arguments and/or the web interface.

Datasets

You can generate your own datasets using fgftool distributed as a part of Blueprints Extensions (one of the prerequisites of PIG). You can also download datasets with up to 1 million nodes from here:

https://drive.google.com/folderview?id=0B3jkRHQ7nKvnbDhsWHBySVV6VVk&usp=sharing

Place the datasets in the directory specified in the configuration file. The default is data/datasets in the project directory.

Publications