yesoer / peersdb

A distributed p2p database, based on orbitdb and intended for the sharing of datasets for model training
GNU General Public License v3.0
0 stars 0 forks source link

PeersDB Continued at PeersDB/peersdb

A distributed p2p database, based on orbitdb, intended for the sharing of datasets for model training

This project is archived because it is related to a thesis. Hence the ongoing development and improvements have been moved to a new repository.

The active development of this project has been continued at PeersDB/peersdb.

Please visit the new repository for the latest updates, contributions, and discussions. Thank you for your support and interest in this project!

Table of Contents

Get Started

go run main.go

Flags

You may use the following flags to configure your peersdb instance.

Flag Description Default
-shell enables the shell interface false
-http enables the http interface false
-ipfs-port sets the ipfs port 4001
-http-port sets the http port 8080
-experimental enables kubo experimental features true
-repo configure the repo/directory name for the ipfs node peersdb
-devlogs enables development level logging false
-root makes this node a root node meaning it will create it's own datastore false
-download-dir configure where to store downloaded files etc. ~/Downloads/
-full-replica enable full data replication through ipfs pinning false
-bootstrap set a bootstrap peer to connect to on startup ""
-benchmark enables benchmarking on this node false
-region if the nodes region is set, it is added to the benchmark data ""

There is also a persitent config file but you probably don't want to change anything in there.

Contribution :

Branch naming should look like this <type>/<name> where words in "name" are separated by '-' and type is one of the following (extend if needed)

type when to use
feat any new features
maintenance any work on docs, git workflows, tests etc.
refactor when refactoring existing parts of the application
fix bug fixes
test testing environments/throwaway branches

More specific distinction happens in commit messages which should be structured as follows :

<type>(<scope>): <subject>

type Must be one of the following:

scope means the part of the software, which usually will be best identified by the package name.

subject gives a short idea of what was done/what the intend of the commit is.

As for the commit body there is no mandatory structure as of now.

Issues and Pull Requests for now will not have any set guidelines.

As a rule of thumb for merging make sure to rebase before doing so.

Debugging

Since I had a rough start here is some help on how to use delve for debugging.

From the root of this repo use the following command to start debugging (change the flags as needed) :

dlv debug peersdb -- -http -benchmark -http-port 8001 -ipfs-port 4001 -repo peersdb1 -root=true -devlogs


Another difficulty was setting breakpoints in dependencies. Here is an example for setting a breakpoint in the basestores handleEventWrite method :

(dlv) b berty.tech/go-orbit-db/stores/basestore.(*BaseStore).handleEventWrite

The response for the command above will also include your local paths for those dependencies, so you can set breakpoints by line now :

(dlv) b /Users/<username>/go/pkg/mod/berty.tech/go-orbit-db@v1.21.0/stores/basestore/base_store.go:853

Architecture

Store Replication

The contributions store, holds file-ipfs-paths. It needs to be replicated for peers to know which data is available.

A new peersdb instance could be a root instance. The root instance creates a "transactions" orbitdb EventLog store. A new non-root instance will start and when it's connected to root it will replicate said store. From now on they will replicate via events. If a node restarts they will try to load the datastore from disk.

IPFS Replication

IPFS Replication is achieved through IPFS pinning. It needs to be enabled via the full-replica flag. Pinning is triggered whenever the orbitdb contribution store receives data i.e. the replicated event is triggered.

Validation

Files need to be validated. The approach is as follows :

when peers add new contribution blocks they try validating the data

each peer keeps their own validation records

when someone wants to know (implemented in isValid which uses accValidations)

  1. they retrieve all info (signed by the sender) via pubsub
    • announce their wish via topic : "validation" with message data : their id + the files cid
    • receive votes via topic : their id + the files cid
      • peers start listening for those requests in awaitValidationReq
  2. count votes themselves
    • when more than half of the connected peers have voted for valid, it's valid else the node validates the data itself
  3. persist the result
    • in the validations store mentioned earlier

APIs

Shell

Shell commands look like this : <command> <arg1> <arg2> ...

get

Description : Download ipfs content by it's ipfs path. The destination can be configured via the -download-dir flag. Ipfs paths can be retrieved via the query command.

Args :

Description Example
The path of some ipfs content /ipfs/4001/p2p/QmRQSrmFNEWx7qKF5jrdLJ4oS8dZzYpTKDoAKoDzL3zXr7

Returns : A status string.

connect

Description : Manually connect to other peers .

Args :

Description Example
The path of another ipfs node /ip4/127.0.0.1/tcp/4001/p2p/QmRQSrmFNEWx7qKF5jrdLJ4oS8dZzYpTKDoAKoDzL3zXr7

Returns : A status string.

post

Description : Adds a file to the local ipfs node and stores the contribution block in the eventlog

Args :

Description Example
the filepath ./main.go

Returns : A status string.

query

Description : Queries the eventlog for all entries

Args :

-

Returns : A results list.

HTTP

POST /peersdb/command

Execute a command.

Body :

{
  method: {
    argcnt: int
    cmd: string
  }
  args: [
    string
  ]
}

cmd identifies the same commands as described under Shell. They also receive the same arguments. The only exception ist the "POST" command, where one has to provide a base64 encoded file instead under the "file" key.

Evaluation

The eval folder contains everything we need for some predefined scenarios on a configurable cluster of nodes. To make it short : we use Helm charts to deploy docker containers on kubernetes, and scripts of http requests to define certain workflows. This allows us to easily evaluate how well peersdb handles certain tasks like replication. Our runs were largely executed on GCP but if you want to dabble around with them locally we'd advice to use kind.

Note : the dockerfile as well as the helm chart are specifically built for evaluation, for anything else feel free to use them as a starting point though.

If you want to build your own docker image you can do it like this :

docker buildx build --platform linux/amd64,linux/arm64 --push -t <your docker repo>:latest -f eval/dockerfile .

this way the image will be built for amd64 and arm64 architectures and directly pushed to the configured image repository. If you don't care about multi-architecture builds you may simply use :

docker build -t <your docker repo> -f eval/dockerfile .

To deploy the helm chart(s) :

helm install -f ./eval/root-peer-values.yaml peersdb-root ./eval/helm
helm install -f ./eval/peer-values.yaml peersdb-peers ./eval/helm

The workflows executed on the cluster use the HTTP API and are defined programatically. To run them, we will copy whichever script we want into the container and execute it manually.

kubectl cp ./eval/workflows/ default/<root pod>:/app/
kubectl exec -it <root pod> -- /bin/sh

If you want to change the workflow, simply change the script referenced in the yaml.

For deploying on specific nodes (relevant when evaluating in a cluster with nodes in different regions) use the following approach :

kubectl label nodes <you node> region=<your node's region>
helm install -f ./eval/peer-values.yaml peersdb-peers ./eval/helm --set nodeSelector.region=<your nodes region>

There also is a shell script under eval/workflows/eval.sh to help setting up peers in a k8s cluster with multiple regions. It's very tailored to the experiments we ran but has been added for documentation. The raw evaluation results can be found under eval/results/.