Open arokem opened 6 years ago
I've not been involved lately in the development, so I might not be of much help, but I'm pretty sure you need to be able to have distributed storage available. At CERN we use Volumes provided by CephFS which support the ReadWriteMany
access mode (see this table https://kubernetes.io/docs/concepts/storage/persistent-volumes/ ) I think on GCP the only option available is Cloud Filestore
, https://cloud.google.com/filestore/docs/accessing-fileshares, but I haven't tried this yet. Maybe @diegodelemos or @tiborsimko can comment whether a shared fs (or even Ceph) is still a hard requirement
In any case: happy to see people interested in deploying REANA, we'll try to help as much we can!
Which suggest that maybe it's still trying to write to a disallowed location.
@arokem The DB pod error might be connected to writing to disallowed location indeed... We are using /reana
and /reanadb
locations in default configurations. Perhaps you have changed the former but not the latter?
$ git grep reanadb
reana_cluster/configurations/reana-cluster-dev.yaml: db_persistence_path: "/reanadb"
reana_cluster/configurations/reana-cluster-latest.yaml: db_persistence_path: "/reanadb"
reana_cluster/configurations/reana-cluster.yaml: db_persistence_path: "/reanadb"
Alternatively, you could also switch to using a DB instance outside of the cluster.
P.S. We should perhaps switch /reana
and /reanadb
to some more reasonable defaults...
I think on GCP the only option available is Cloud Filestore
Indeed REANA needs a shared filesystem at this stage. Support for distributed file systems, say S3, is in the plans later on.
We have not tried yet the installation on GCP but it would be definitely interesting to provide runnable configurations out of the box!
FYI I am currently trying to get this running on GKE (v1.22).
Currently trying to get the barebones running (ingess, quota, etc turned off)
Some sticking points:
reana-workflow-controller
pods are crashing due to the reana-code
hostpath volume. Should be an easy fix allowing users to specify a storageclass for this as well (defaulting to hostPath). EDIT: Looks like I need to turn off debug to fix this.I might have missed options that allow for this in the config. If you're interested in contributions I'd happily contribute some documentation etc if people can help me with PRs as I work out these issues.
Some far future things I'm interested in:
@elibixby Thanks for reaching out! This issue is quite old, so let me share a short update about REANA-on-GKE status since 2018.
About a year or two ago we have tested a small REANA deployment on GKE, targeting mostly single-node deployment. The aim was just to test the general applicability of our Helm charts on various platforms. Everything worked well. This year we are just about to start work on a bigger GKE deployment for ATLAS physics use case (CC @lukasheinrich) which will need many nodes. So your message comes very timely!
Here are a few technical notes:
WRT Kubernetes 1.22, the current REANA does not support it yet because we are using older K8S APIs that were deprecated in 1.22. However, we have a fully working PR set that we should get to merging very soon, most probably next week after 0.8.1 is released.
WRT storage, we are definitely open to changes. The GKE deployment last year was done for single-node only, so using ephemeral/local storage. We would definitely need a shared filesystem for multi-node deployments, so it would be interesting to hear your plans about the best GKE shared storage options.
WRT auth, REANA currently offers either local accounts or CERN-specific SSO. However, CERN has a new OIDC-based authn/authz system in place, which we were thinking of migrating towards later in the year. If OIDC would be OK for your needs, there could be some synergy there, as for the storage need synergies.
WRT PostgreSQL, it is already possible to use a DB instance living outside of the cluster. That's our primary mode of deployment at CERN, actually. It should be sufficient to set Helm values.yaml
variables in db_env_config for example:
db_env_config:
REANA_DB_NAME: "reana"
REANA_DB_HOST: "db.example.org"
REANA_DB_PORT: "5432"
and then disable the "internal" reana-db
component:
components:
reana_db:
enabled: false
and introduce corresponding secrets for REANA_DB_USERNAME
and REANA_DB_PASSWORD
:
secrets:
database:
user: *******
password: *********
This should be enough to make DB-as-external-service usable. We can update our documentation with more detailed recipe if you are interested.
(BTW FWIW we have been using both DB-as-external-service and DB-as-internal-pod and the latter technique was working quite well for some of our deployments. But our primary mode of operation is DB-as-external-service as well.)
WRT quota management, we have not planned any concrete work on this in the near future; but we are definitely open to make it more flexible.
WRT using different volumes as workspaces for different users, we have done some preliminary work on abstracting workspace concept last summer. Achieving that would however require quite a lot of work still.
If you have some GKE documentation recipes and/or code to contribute, we'll be naturally happy to collaborate!
WRT auth, REANA currently offers either local accounts or CERN-specific SSO. However, CERN has a new OIDC-based authn/authz system in place, which we were thinking of migrating towards later in the year. If OIDC would be OK for your needs, there could be some synergy there, as for the storage need synergies.
My ideal solution is an "authless" mode where I can put something like https://github.com/travisghansen/external-auth-server/ in front the API/UI and manage users quota and auth myself
A "nice to have" would be to allow mapping forwarded user IDs to namespaces and service accounts, to better isolate workflows from each-other, then use cluster quota
Hi @elibixby - thanks for your interest. As @tiborsimko said we're in the process of working with some folks in Google to deploy REANA @ GCP and it'd be great to learn more about your usecase Would you be interested to share a short slide-deck or similar in a call? (feel free to reach out at lukas.heinrich at cern dot ch)
Hey @lukasheinrich I got it working without much trouble in the end.
Main hiccups besides those above were:
I'd be happy to get on a call and discuss my use cases if you're interested I'll shoot you an email from eli at cradle dot bio
Hello! We are interested in setting up a Reana cluster on Google Cloud Platform (GCP).
We followed the instructions in the zero-to-jupyterhub documentation (https://zero-to-jupyterhub.readthedocs.io/en/stable/) to set up a Kubernetes cluster, and then followed the instructions here: https://reana-cluster.readthedocs.io/en/latest/gettingstarted.html#deploy-locally, but instead of using minikube, we pointed it to our cluster-in-the-clouds. Pretty quickly, we discovered that we can't write to
/reana
on these cloud machines (see: https://cloud.google.com/container-optimized-os/docs/concepts/security). All the pods come crashing down as soon as they try writing (into) this directory. So, we edited the provided default configuration (https://reana-cluster.readthedocs.io/en/latest/userguide.html#configure-reana-cluster) to point to/etc/reana
, which is writeable. This solved most of the problems. The one remaining issue is that the database pod that is still crashing. The logs in this pod are:Which suggest that maybe it's still trying to write to a disallowed location.
We're not neccessarily expecting you to fix this, if it's not currently on your road-map, but we thought it would be good to raise this, and at least document our experiments for future experimenters seeking guidance.
But of course: your thoughts would be appreciated. Thanks!