reanahub / reana

REANA: Reusable research data analysis platform
https://docs.reana.io
MIT License
124 stars 54 forks source link

Setting up on Google Cloud #356

Open arokem opened 6 years ago

arokem commented 6 years ago

Hello! We are interested in setting up a Reana cluster on Google Cloud Platform (GCP).

We followed the instructions in the zero-to-jupyterhub documentation (https://zero-to-jupyterhub.readthedocs.io/en/stable/) to set up a Kubernetes cluster, and then followed the instructions here: https://reana-cluster.readthedocs.io/en/latest/gettingstarted.html#deploy-locally, but instead of using minikube, we pointed it to our cluster-in-the-clouds. Pretty quickly, we discovered that we can't write to /reana on these cloud machines (see: https://cloud.google.com/container-optimized-os/docs/concepts/security). All the pods come crashing down as soon as they try writing (into) this directory. So, we edited the provided default configuration (https://reana-cluster.readthedocs.io/en/latest/userguide.html#configure-reana-cluster) to point to /etc/reana, which is writeable. This solved most of the problems. The one remaining issue is that the database pod that is still crashing. The logs in this pod are:

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
FATAL:  could not write to file "pg_xlog/xlogtemp.29": No space left on device
child process exited with exit code 1
initdb: removing contents of data directory "/var/lib/postgresql/data"
running bootstrap script ... 

Which suggest that maybe it's still trying to write to a disallowed location.

We're not neccessarily expecting you to fix this, if it's not currently on your road-map, but we thought it would be good to raise this, and at least document our experiments for future experimenters seeking guidance.

But of course: your thoughts would be appreciated. Thanks!

lukasheinrich commented 6 years ago

I've not been involved lately in the development, so I might not be of much help, but I'm pretty sure you need to be able to have distributed storage available. At CERN we use Volumes provided by CephFS which support the ReadWriteMany access mode (see this table https://kubernetes.io/docs/concepts/storage/persistent-volumes/ ) I think on GCP the only option available is Cloud Filestore, https://cloud.google.com/filestore/docs/accessing-fileshares, but I haven't tried this yet. Maybe @diegodelemos or @tiborsimko can comment whether a shared fs (or even Ceph) is still a hard requirement

In any case: happy to see people interested in deploying REANA, we'll try to help as much we can!

tiborsimko commented 5 years ago

Which suggest that maybe it's still trying to write to a disallowed location.

@arokem The DB pod error might be connected to writing to disallowed location indeed... We are using /reana and /reanadb locations in default configurations. Perhaps you have changed the former but not the latter?

$ git grep reanadb
reana_cluster/configurations/reana-cluster-dev.yaml:  db_persistence_path: "/reanadb"
reana_cluster/configurations/reana-cluster-latest.yaml:  db_persistence_path: "/reanadb"
reana_cluster/configurations/reana-cluster.yaml:  db_persistence_path: "/reanadb"

Alternatively, you could also switch to using a DB instance outside of the cluster.

P.S. We should perhaps switch /reana and /reanadb to some more reasonable defaults...

tiborsimko commented 5 years ago

I think on GCP the only option available is Cloud Filestore

Indeed REANA needs a shared filesystem at this stage. Support for distributed file systems, say S3, is in the plans later on.

We have not tried yet the installation on GCP but it would be definitely interesting to provide runnable configurations out of the box!

elibixby commented 2 years ago

FYI I am currently trying to get this running on GKE (v1.22).

Currently trying to get the barebones running (ingess, quota, etc turned off)

Some sticking points:

I might have missed options that allow for this in the config. If you're interested in contributions I'd happily contribute some documentation etc if people can help me with PRs as I work out these issues.

Some far future things I'm interested in:

tiborsimko commented 2 years ago

@elibixby Thanks for reaching out! This issue is quite old, so let me share a short update about REANA-on-GKE status since 2018.

About a year or two ago we have tested a small REANA deployment on GKE, targeting mostly single-node deployment. The aim was just to test the general applicability of our Helm charts on various platforms. Everything worked well. This year we are just about to start work on a bigger GKE deployment for ATLAS physics use case (CC @lukasheinrich) which will need many nodes. So your message comes very timely!

Here are a few technical notes:

db_env_config:
  REANA_DB_NAME: "reana"
  REANA_DB_HOST: "db.example.org"
  REANA_DB_PORT: "5432"

and then disable the "internal" reana-db component:

components:
  reana_db:
     enabled: false

and introduce corresponding secrets for REANA_DB_USERNAME and REANA_DB_PASSWORD:

secrets:
  database:
    user: *******
    password: *********

This should be enough to make DB-as-external-service usable. We can update our documentation with more detailed recipe if you are interested.

(BTW FWIW we have been using both DB-as-external-service and DB-as-internal-pod and the latter technique was working quite well for some of our deployments. But our primary mode of operation is DB-as-external-service as well.)

If you have some GKE documentation recipes and/or code to contribute, we'll be naturally happy to collaborate!

elibixby commented 2 years ago

WRT auth, REANA currently offers either local accounts or CERN-specific SSO. However, CERN has a new OIDC-based authn/authz system in place, which we were thinking of migrating towards later in the year. If OIDC would be OK for your needs, there could be some synergy there, as for the storage need synergies.

My ideal solution is an "authless" mode where I can put something like https://github.com/travisghansen/external-auth-server/ in front the API/UI and manage users quota and auth myself

A "nice to have" would be to allow mapping forwarded user IDs to namespaces and service accounts, to better isolate workflows from each-other, then use cluster quota

lukasheinrich commented 2 years ago

Hi @elibixby - thanks for your interest. As @tiborsimko said we're in the process of working with some folks in Google to deploy REANA @ GCP and it'd be great to learn more about your usecase Would you be interested to share a short slide-deck or similar in a call? (feel free to reach out at lukas.heinrich at cern dot ch)

elibixby commented 2 years ago

Hey @lukasheinrich I got it working without much trouble in the end.

Main hiccups besides those above were:

I'd be happy to get on a call and discuss my use cases if you're interested I'll shoot you an email from eli at cradle dot bio