rucio / k8s-tutorial

Rucio K8s tutorial
7 stars 24 forks source link

Rucio Kubernetes Tutorial

Preliminaries

git clone https://github.com/rucio/k8s-tutorial/

NOTE: All following commands should be run from the top-level directory of this repository.

Set up a Kubernetes cluster

You can skip this step if you have already set up a Kubernetes cluster.

./scripts/setup-minikube.sh

Deploy Rucio, FTS and storage

You can perform either an automatic deployment or a manual deployment, as documented below.

Automatic deployment

./scripts/deploy-rucio.sh

Manual deployment

Add repositories to Helm

helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add rucio https://rucio.github.io/helm-charts

Apply secrets

kubectl apply -k ./secrets

(Optional) Delete existing Postgres volume claim

If you have done this step in a previous tutorial deployment on this cluster, the existing Postgres PersistentVolumeClaim must be deleted.

  1. Verify if the PVC exists via:
kubectl get pvc data-postgres-postgresql-0

If the PVC exists, the command will return the following message:

NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
data-postgres-postgresql-0   Bound    ...   8Gi        RWO            standard       <unset>                 4s

If the PVC does not exist, the command will return this message:

Error from server (NotFound): persistentvolumeclaims "data-postgres-postgresql-0" not found

You can skip to the next section if the PVC does not exist.

  1. If the PVC exists, patch it to allow deletion:
kubectl patch pvc data-postgres-postgresql-0 -p '{"metadata":{"finalizers":null}}'
  1. Delete the PVC:
kubectl delete pvc data-postgres-postgresql-0
  1. You might also need to uninstall postgres if it is installed:
helm uninstall postgres

Install Postgres

helm install postgres bitnami/postgresql -f manifests/values-postgres.yaml

Verify that Postgres is running

kubectl get pod postgres-postgresql-0

Once the Postgres setup is complete, you should see STATUS: Running.

Start init container pod

kubectl apply -f manifests/init-pod.yaml
kubectl logs -f init

Verify that the init container pod setup is complete

kubectl get pod init

Once the init container pod setup is complete, you should see STATUS: Completed.

Deploy the Rucio server

helm install server rucio/rucio-server -f manifests/values-server.yaml
kubectl rollout status deployment server-rucio-server

Start the XRootD (XRD) storage container pods

kubectl apply -f manifests/xrd.yaml

Deploy the FTS database (MySQL)

kubectl apply -f manifests/ftsdb.yaml
kubectl rollout status deployment fts-mysql

Deploy the FTS server

kubectl apply -f manifests/fts.yaml
kubectl rollout status deployment fts-server

Deploy the Rucio daemons

helm install daemons rucio/rucio-daemons -f manifests/values-daemons.yaml

This command might take a few minutes.

Troubleshooting

helm list # list all helm installations
helm delete $installation
kubectl get jobs # get all jobs
kubectl delete jobs/$jobname

Use Rucio

Once the setup is complete, you can use Rucio by interacting with it via a client.

You can either run the provided script to showcase the usage of Rucio, or you can manually run the Rucio commands described in the Manual client usage section.

Client usage showcase script

./scripts/use-rucio.sh

Manual client usage

Start client container pod for interactive use

kubectl apply -f manifests/client.yaml
kubectl get pod client

Once the client container pod setup is complete, you should see STATUS: Running.

Enter interactive shell in the client container

kubectl exec -it client -- /bin/bash

Create the Rucio Storage Elements (RSEs)

rucio-admin rse add XRD1
rucio-admin rse add XRD2
rucio-admin rse add XRD3

Add the protocol definitions for the storage servers

rucio-admin rse add-protocol --hostname xrd1 --scheme root --prefix //rucio --port 1094 --impl rucio.rse.protocols.gfal.Default --domain-json '{"wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}, "lan": {"read": 1, "write": 1, "delete": 1}}' XRD1
rucio-admin rse add-protocol --hostname xrd2 --scheme root --prefix //rucio --port 1094 --impl rucio.rse.protocols.gfal.Default --domain-json '{"wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}, "lan": {"read": 1, "write": 1, "delete": 1}}' XRD2
rucio-admin rse add-protocol --hostname xrd3 --scheme root --prefix //rucio --port 1094 --impl rucio.rse.protocols.gfal.Default --domain-json '{"wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}, "lan": {"read": 1, "write": 1, "delete": 1}}' XRD3

Enable FTS

rucio-admin rse set-attribute --rse XRD1 --key fts --value https://fts:8446
rucio-admin rse set-attribute --rse XRD2 --key fts --value https://fts:8446
rucio-admin rse set-attribute --rse XRD3 --key fts --value https://fts:8446

Note that 8446 is the port exposed by the fts-server pod. You can view the ports opened by a pod by kubectl describe pod PODNAME.

Fake a full mesh network

rucio-admin rse add-distance --distance 1 --ranking 1 XRD1 XRD2
rucio-admin rse add-distance --distance 1 --ranking 1 XRD1 XRD3
rucio-admin rse add-distance --distance 1 --ranking 1 XRD2 XRD1
rucio-admin rse add-distance --distance 1 --ranking 1 XRD2 XRD3
rucio-admin rse add-distance --distance 1 --ranking 1 XRD3 XRD1
rucio-admin rse add-distance --distance 1 --ranking 1 XRD3 XRD2

Indefinite storage quota for root

rucio-admin account set-limits root XRD1 -1
rucio-admin account set-limits root XRD2 -1
rucio-admin account set-limits root XRD3 -1

Create a default scope for testing

rucio-admin scope add --account root --scope test

Create initial transfer testing data

dd if=/dev/urandom of=file1 bs=10M count=1
dd if=/dev/urandom of=file2 bs=10M count=1
dd if=/dev/urandom of=file3 bs=10M count=1
dd if=/dev/urandom of=file4 bs=10M count=1

Upload the files

rucio upload --rse XRD1 --scope test file1
rucio upload --rse XRD1 --scope test file2
rucio upload --rse XRD2 --scope test file3
rucio upload --rse XRD2 --scope test file4

Create a few datasets and containers

rucio add-dataset test:dataset1
rucio attach test:dataset1 test:file1 test:file2

rucio add-dataset test:dataset2
rucio attach test:dataset2 test:file3 test:file4

rucio add-container test:container
rucio attach test:container test:dataset1 test:dataset2

rucio add-dataset test:dataset3
rucio attach test:dataset3 test:file4

Create a rule

rucio add-rule test:container 1 XRD3

This command will output a rule ID, which can also be obtained via:

rucio list-rules test:container

Check rule info

rucio rule-info <rule_id>

As the daemons run with long sleep cycles (e.g. 30 seconds, 60 seconds) by default, this could take a while. You can monitor the output of the daemon containers to see what they are doing.

Some helpful commands

Bash:

source <(kubectl completion bash)

Zsh:

source <(kubectl completion zsh)
kubectl get pods 
kubectl get pods --all-namespaces
kubectl logs <NAME>
kubectl logs -f <NAME>
helm repo update
minikube stop