semagrow / kobe

Framework for benchmarking SPARQL query federators
https://semagrow.github.io/kobe/
Apache License 2.0
10 stars 1 forks source link
benchmarking big-data database-benchmarking distributed federated semagrow sparql

KOBE: Cloud-Native Open Benchmark Engine for SPARQL Query Processors

KOBE is a benchmarking system that leverages Docker and Kubernetes in order to reproduce experiments of federated query processing in a collections of data sources.

Overview

In the SPARQL query processing community, as well as in the wider databases community, benchmark reproducibility is based on releasing datasets and query workloads. However, this paradigm breaks down for federated query processors, as these systems do not manage the data they serve to their clients but provide a data-integration abstraction over the actual query processors that are in direct contact with the data.

The KOBE benchmarking engine is a system that aims to provide a generic platform to perform benchmarking and experimentation that can be reproducible in different environments. It was designed with the following objectives in mind:

  1. to allow for benchmark and experiment specifications to be reproduced in different environments and be able to produce comparable and reliable results;
  2. to ease the deployment of complex benchmarking experiments by automating the tedious tasks of initialization and execution.

Installation

Prerequisites

Note: The following instructions were tested on Debian 12. Minor adjustments may be necessary for installation on other Linux distributions or operating systems.

Get Kubernetes

curl -LO "https://dl.k8s.io/release/v1.20.7/bin/linux/amd64/kubectl"
curl -LO "https://dl.k8s.io/release/v1.20.7/bin/linux/amd64/kubectl.sha256"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Use Minikube [Optional]

If you are not using an existing Kubernetes cluster, you can quickly set up a local environment for testing and development using Minikube:

Download and install Minikube on your system:

curl -LO https://storage.googleapis.com/minikube/releases/v1.21.0/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

Prepare Docker installation and setup:

sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Start Minikube with Docker driver:

minikube start --driver=docker
kubectl cluster-info

Enable NFS Support

sudo apt-get install nfs-common

Install the Kubernetes operator

KOBE needs the Kubernetes operator that needs to be installed in the Kubernetes cluster. To quickly install the KOBE operator in a Kubernetes cluster, you can use the kobectl script found in the bin directory:

export PATH=`pwd`/bin:$PATH
kobectl install operator .

If you are using kubernetes version 1.15 and below you should instead use

kobectl install operator-v1beta1 

Alternatively, you could run the following commands:

kubectl apply -f operator/deploy/crds
kubectl apply -f operator/deploy/service_account.yaml
kubectl apply -f operator/deploy/clusterrole.yaml
kubectl apply -f operator/deploy/clusterrole_binding.yaml
kubectl apply -f operator/deploy/operator.yaml

For Kubernetes version 1.15 and below swap

kubectl apply -f operator/deploy/crds

with

kubectl apply -f operator/deploy/crds-v1beta1

You will get a confirmation message that each resource has successfully been created. This will set the operator running in your Kubernetes cluster and needs to be done only once.

Install the Networking subsystem

KOBE uses Istio to support network delays between the different deployments. To install Istio first define the version:

export ISTIO_VERSION=1.11.3

then deploy Istio:

kobectl install istio .

Alternatively, you can consult the official installation guide or you can type the following commands.

curl -L https://istio.io/downloadIstio | sh -
./istio-*/bin/istioctl manifest apply --set profile=default

Install Helm

KOBE uses Helm to simplify the management of dependencies within Kubernetes environments. To install Helm on your system, run:

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3.
chmod 700 get_helm.sh.
./get_helm.sh.

Install the Evaluation Metrics Extraction subsystem

To enable the evaluation metrics extraction subsystem, run

kobectl install efk .

or alternatively the following

helm repo add elastic https://helm.elastic.co
helm repo add kiwigrid https://kiwigrid.github.io
helm install elasticsearch elastic/elasticsearch --set persistence.enabled=false --set replicas=1 --version 7.6.2
helm install kibana elastic/kibana --set service.type=NodePort --version 7.6.2
helm install fluentd kiwigrid/fluentd-elasticsearch -f operator/deploy/efk-config/fluentd-values.yaml --version 8.0.1
kubectl apply -f operator/deploy/efk-config/kobe-kibana-configuration.yaml

These result in the simplest setup of an single-node Elasticsearch that does not persist data across pod recreation, a Fluentd DaemonSet and a Kibana node that exposes a NodePort.

After all pods are in Running state Kibana dashboards can be accessed at

http://<NODE-IP>:<NODEPORT>/app/kibana#/dashboard/

where <NODE-IP> the IP of any of the Kubernetes worker nodes and <NODEPORT> the result of kubectl get -o jsonpath="{.spec.ports[0].nodePort}" services kibana-kibana.

The setup can be customized by changing the configuration parameters of each helm chart. Please check the corresponding documentation of each chart for more info.

Recommended Versions

To ensure compatibility, we recommend using the following versions of the dependencies:

These versions have been tested and verified to work together.

Example

The typical workflow of defining a KOBE experiment is the following.

  1. Create one DatasetTemplate for each dataset server you want to use in your benchmark.
  2. Define your Benchmark, which should contain a list of datasets and a list of queries.
  3. Create one FederatorTemplate for the federator engine you want to use in your experiment.
  4. Define an Experiment over your previously defined benchmark.

Several examples of the above specifications can be found in the examples directory.

In the following, we show the steps for deploying an experiment on a simple benchmark that comprises three queries over a Semagrow federation of two Virtuoso endpoints.

You can use the kobectl script found in the bin directory for controlling your experiments:

export PATH=`pwd`/bin:$PATH
kobectl help

First, apply the templates for Virtuoso and Semagrow:

kobectl apply examples/dataset-virtuoso/virtuosotemplate.yaml
kobectl apply examples/federator-semagrow/semagrowtemplate.yaml

Then, apply the benchmark.

kobectl apply examples/benchmark-toybench/toybench-simple.yaml

Before running the experiment, you should verify that the datasets are loaded. Use the following command:

kobectl show benchmark toybench-simple

When the datasets are loaded, you should get the following output:

NAME  STATUS
toy1  Running
toy2  Running

Proceed now with the execution of the experiment:

kobectl apply examples/experiment-toybench/toyexp-simple.yaml

As perviously, you can review the state of the experiment with the following command:

kobectl show experiment toyexp-simple

You can now view the evaluation metrics in the Kibana dashboards.

For removing all of the above, issue the following commands:

kobectl delete experiment toyexp-simple
kobectl delete benchmark toybench-simple
kobectl delete federatortemplate semagrowtemplate
kobectl delete datasettemplate virtuosotemplate

For more advanced control options for KOBE, use kubectl.

Removal

To remove KOBE from your cluster, run the following command:

kobectl purge .

To remove KOBE operator manually, run

kubectl delete -f operator/deploy/operator.yaml
kubectl delete -f operator/deploy/role.yaml
kubectl delete -f operator/deploy/clusterrole_binding.yaml
kubectl delete -f operator/deploy/clusterrole.yaml
kubectl delete -f operator/deploy/service_account.yaml
kubectl delete -f operator/deploy/crds

To remove Istio manually, run

./istio-*/bin/istioctl manifest generate --set profile=default | kubectl delete -f -
kubectl delete namespace istio-system

To remove the evaluation metrics extraction subsystem manually, run

helm uninstall elasticsearch
helm uninstall kibana
helm uninstall fluentd
helm repo remove elastic
helm repo remove kiwigrid
kubectl delete jobs.batch kobe-kibana-configuration
kubectl delete configmaps kobe-kibana-config

and then in each Kubernetes node

rm -rf /var/log/fluentd-buffers/
rm /var/log/containers.log.pos