ODIN benchmark

ODIN benchmark for data extraction solutions for structured data. The benchmark is designed to evaluate the backend of these solutions (especially the acquisition phase) by simulating the ingestion, storage and retrieval of streams of RDF data. To this end, ODIN emulates loads faced by triple store during the insertion of triples by an extraction solution for enterprise data (e.g., industry sensors) based on models derived from real data. The key performance indicators during the evaluation are completeness and efficiency.

Uploading Benchmark to HOBBIT Platform

Guidelines on how to upload a benchmark can be found here: https://github.com/hobbit-project/platform/wiki/Benchmark-your-system

Building the Benchmark

Download and clone this repository
Open command line, use appropriate commands to enter the folder that contains the context of the repository, type mvn clean package -U -Dmaven.test.skip=true and press enter.

Running the Benchmark

If you want to run ODIN using the platform, please follow the guidelines found here: https://github.com/hobbit-project/platform/wiki/Experiments

Creating docker images for ODIN's components

The current docker files can be found here: https://github.com/hobbit-project/odin/tree/master/docker

(must build the benchmark first) ODIN consists of 4 basic components:

OdinBenchmarkController
OdinDataGenerator
OdinEvaluationModule
OdinTaskGenerator

If a user wants to create docker images for OdinBenchmarkController, OdinEvaluationModule and OdinTaskGenerator, he/she must use the following commands:

FROM java

ADD target/odin-1.0.0-SNAPSHOT.jar /odin/odin.jar

WORKDIR /odin

CMD java -cp odin.jar org.hobbit.core.run.ComponentStarter org.hobbit.odin.odintaskgenerator.X

where X is the name of the corresponding ODIN component. This docker file tells:

that the image extends a container in which a Java program can be run,
that the generated jar file should be copied to the /odin/ directory,
that this directory is our working directory and
that it should execute the ComponentStarter that will load our component X

If the user wants to create docker image for OdinDataGenerator, he/she must use the following commands:

FROM maven:3.3.9-jdk-8

ADD target/odin-1.0.0-SNAPSHOT.jar /odin/odin.jar

ADD scripts/download.sh /odin/download.sh

WORKDIR /odin

CMD java -cp odin.jar org.hobbit.core.run.ComponentStarter org.hobbit.odin.odindatagenerator.OdinDataGenerator

which is the same as the previous example apart from the line ADD scripts/download.sh /odin/download.sh. This line adds the script download.sh (included in the repository) into the docker container working directory /odin/, so that the user can run ODIN using the TWIG mimicking algorithm.

Description of ODIN parameters:

Duration of the benchmark: The user must determine the duration of the task by assigning a value in milliseconds to the field. The default value is set to 600,000ms. Note that the duration of each experiment is at most 40min.
Name of mimicking algorithm output folder: The relative path of the output dataset folder. Default value = output_data/.
Number of insert queries per stream: This value is responsible for determining the number of INSERT SPARQL queries after which a SELECT query is performed. The default value is set to 100.
Population of generated data: This value determines the number of events generated by a mimicking algorithm for one Data Generator. Note that this value might not be equal to the number of generated triples. The default value is set to 1000.
Number of data generators - agents: The number of Data Generators for this experiment. The default value is 2.
Name of mimicking algorithm: The name of the mimicking algorithm to be invoked to generate data. There are two available values: TRANSPORT_DATA (https://github.com/PoDiGG/podigg), that invokes the mimicking algorithm developed by iMec for public transport and (TWIG) (https://github.com/AKSW/TWIG), that invokes the mimicking algorithm for Twitter messages. The default value is TRANSPORT_DATA.
Seed for mimicking algorithm: The seed value for a mimicking algorithm. The default value is 100.
Number of task generators - agents: The number of Task Generators for this experiment. The default value is 1.