reanahub / reana

REANA: Reusable research data analysis platform
https://docs.reana.io
MIT License
127 stars 54 forks source link

Deploying on Rancher #357

Open David-Development opened 6 years ago

David-Development commented 6 years ago

Issue:

When deploying Reana-Cluster onto a Rancher Kubernetes Cluster, I'm running into some certificate issues. Kubectl, on the other hand, still works without problems.

...
HTTPSConnectionPool(host='192.168.1.10', port=8443): Max retries exceeded with url: 
/k8s/clusters/c-rqbzb/api/v1/namespaces/default/secrets?includeUninitialized=false
(Caused by SSLError(CertificateError("hostname '192.168.1.10' doesn't match '192.168.1.10'",),))

Rancher is using port 8443, k8s API is available at (https://192.168.1.10:8443/k8s/clusters/c-rqbzb). I am able to access the url https://192.168.1.10:8443/k8s/clusters/c-rqbzb/api/v1/namespaces/default/secrets in my browser. The certificate for rancher is auto-generated (self-signed). Could this be the problem? Btw. my kube-config file contains the certificate-authority-data section. Kubectl is not complaining about any ssl issues.

I'm trying to start my Reana-Cluster with the following command:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
    -keyout /tmp/tls.key -out /tmp/tls.crt \
    -subj "/CN=192.168.1.10"

./kubectl delete secrets reana-ssl-secrets
./kubectl create secret tls reana-ssl-secrets \
      --key /tmp/tls.key --cert /tmp/tls.crt

reana-cluster init # <-- exception occurs here

Steps to reproduce:

  1. Run Rancher-UI

    # run rancher
    docker run -d --name=rancher --restart=unless-stopped -p 8080:80 -p 8443:443 rancher/rancher:v2.0.8
  2. login (https://localhost:8443), create a new cluster ("custom") --> leave default settings, just click on "next"

  3. make sure to check "etcd", "Control Plane" and "Worker"

  4. copy generated output command into cli

  5. wait until cluster is initialized, click on "Kubeconfig file" and place the content into ~/.kube/config

  6. run commands shown in the issue section

diegodelemos commented 6 years ago

Hello @David-Development, first of all, sorry for the late reply... I have managed to deploy REANA on Rancher following your steps. I've taken the Kubernetes configuration from Rancher UI and copied it over to ~/.kube/config.

screenshot 2018-10-18 at 15 22 27

And it looks more or less like this:

apiVersion: v1
kind: Config
clusters:
- name: "reana"
  cluster:
    server: "https://localhost:8443/k8s/clusters/c-x77qs"
    api-version: v1
    certificate-authority-data: "~~~~~~~"

users:
- name: "user-~~~~"
  user:
    token: "~~~~~~~~~~~~"

contexts:
- name: "reana"
  context:
    user: "user-~~~~"
    cluster: "reana"

current-context: "reana"

Right after I just run reana-cluster init and all components are initialised correctly.

screenshot 2018-10-18 at 15 05 39

Regarding accessing the services from outside the cluster, I have tried getting the address reserved for the reana-server component from the UI and curl but I get a timeout:

$ curl http://192.168.65.3:32121/
curl: (7) Failed to connect to 192.168.65.3 port 32121: Operation timed out

This seems to be a problem that could be solved with some Rancher experience, did you manage to have it working?

diegodelemos commented 6 years ago

Something important to notice which I have forgotten before, you should use reana-cluster in this version if you are using REANA 0.3.0.

Regarding fully running REANA inside Rancher, as a workaround for the issue of not being able to access services from outside the cluster and to make sure that things are working I have run the reana-client inside the cluster as follows:

  1. Login into the reana-server component:
    $ kubectl exec -ti server-657b47685b-ltm8d bash
    >
  2. Install reana-client and configure it, to retrieve the access token you can use reana-cluster env --include-admin-token.
> pip install reana-client
> export REANA_SERVER_URL=http://localhost:5000
> export REANA_ACCESS_TOKEN=FIXME
  1. And then clone locally the hello world example and run it.
    > cd /tmp/
    > git clone https://github.com/reanahub/reana-demo-helloworld
    > cd reana-demo-helloworld/
    > reana-client create
    > export REANA_WORKON=workflow.2
    > reana-client upload
    > reana-client start
    > reana-client status
    > reana-client download
    > cat results/greetings.txt
David-Development commented 6 years ago

@diegodelemos Thank you for your help! I ran the script again today (using the latest version) and the problem was gone. I think the "timeout" occurs because the service is not running on that port anymore? On my cluster the service was migrate from one node to another a couple of times. The port changed every time. I wrote a small script to automate the connect call (I'm using it in a Docker container). Maybe this will be helpful for someone.

KUBE_REANA_POD_NAME=$(kubectl get pods -l app=server -o=custom-columns=:.metadata.name | tr -d '\n')
KUBE_REANA_NODE_NAME=$(kubectl get pods -l app=server -o=custom-columns=:.spec.nodeName | tr -d '\n')
KUBE_REANA_SERVICE_PORT=$(kubectl get service server -n default -o=custom-columns=:.spec.ports[].nodePort | tr -d '\n')

# extract cluster-url and port
$(reana-cluster env --include-admin-token) > ./exports.sh
source ./exports.sh
export REANA_SERVER_URL=http://${KUBE_REANA_NODE_NAME}:${KUBE_REANA_SERVICE_PORT}/

echo "REANA_SERVER_URL: $REANA_SERVER_URL"
echo "REANA_ACCESS_TOKEN: $REANA_ACCESS_TOKEN"

# run sample workflow
WORKFLOW_NAME="helloworld-`date +%s`"
git clone https://github.com/reanahub/reana-demo-helloworld
cd reana-demo-helloworld/

reana-client create --name ${WORKFLOW_NAME} --skip-validation
export REANA_WORKON=${WORKFLOW_NAME}
reana-client upload
reana-client start
reana-client status
reana-client status
reana-client download
cat results/greetings.txt

As I can't use an Ingress Controller, setting a hostPort would be useful to me. However updating the service deployed by reana does not work. kubectl says: service/server patched (no change). But it isn't changed on the cluster. Below you can find the command I used to set a hostPort. The second command is to deploy the reana service on the master node (so that the IP doesn't change all the time). Any ideas how to work around this? Even tried to set this in the Kubernetes Dashboard but as soon as I hit "update", the old config is back.

kubectl patch service server -n default --type='json' -p='[{"op": "add", "path": "/spec/ports/0/hostPort", "value": 54321}]'
kubectl patch service server -n default --type='json' -p='[{"op": "add", "path": "/spec/nodeSelector", "value": { "node-role.kubernetes.io/etcd": "true" }}]'
jordidem commented 5 years ago

Dear David, Diego,

We were following your instructions in order to deploy the reana-cluster on a Rancher Kubernetes Cluster (Rancher version v2.2.4, kubernetes version v1.13.5, Reana cluster 0.5.0, python 2.7). [root@reana-server-test ~]# helm version Client: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}

So this were our steps:

  1. Launch Rancher docker run -d --name=rancher --restart=unless-stopped -p 8080:80 -p 8443:443 rancher/rancher:latest

  2. login (https://localhost:8443), create a new cluster ("custom") --> leave default settings, just click on "next"

  3. make sure to check "etcd", "Control Plane" and "Worker"

  4. copy generated output command into cli

  5. wait until cluster is initialized, click on "Kubeconfig file" and place the content into ~/.kube/config

  6. openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=our IP"

  7. ./kubectl delete secrets reana-ssl-secrets

  8. ./kubectl create secret tls reana-ssl-secrets --key /tmp/tls.key --cert /tmp/tls.crt

  9. We create & activate the virtual environment picreana

  10. we install reana cluster doing "pip install reana-cluster"

  11. We have updated the file reana-cluster.yaml changing only two lines (version and reana_url) from the provided cluster: type: "kubernetes"

    version: "v1.14.0"

    version: "v1.13.5" db_config: &db_base_config

    • REANA_SQLALCHEMY_DATABASE_URI: "postgresql+psycopg2://reana:reana@db:5432/reana" root_path: "/var/reana" shared_volume_path: "/var/reana"

      reana_url: "reana-dev.cern.ch"

      reana_url: "reana-server-test.pic.es"

  12. Then we run reana-cluster --debug -f reana-cluster.yaml init --traefik and we are stuck with the following error: (picreana) [root@reana-server-test configurations]# reana-cluster --debug -f reana-cluster.yaml init --traefik [ERROR] Got an unexpected keyword argument 'include_uninitialized' to method list_namespaced_secret Traceback (most recent call last): File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/reana_cluster/cli/cluster.py", line 162, in init backend.init(traefik) File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/reana_cluster/backends/kubernetes/k8s.py", line 350, in init manifest) File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/reana_cluster/backends/kubernetes/k8s.py", line 464, in _add_service_acc_key_to_component 'default', include_uninitialized='false') File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12884, in list_namespaced_secret (data) = self.list_namespaced_secret_with_http_info(namespace, **kwargs) File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12921, in list_namespaced_secret_with_http_info " to method list_namespaced_secret" % key TypeError: Got an unexpected keyword argument 'include_uninitialized' to method list_namespaced_secret

Do you know what's happening? Any help would be very appreciated! Many thanks in advance,

roksys commented 5 years ago

Hi @jordidem,

REANA-Cluster 0.5.0 has no upper version limit for Kubernetes package - https://github.com/reanahub/reana-cluster/blob/14702db6d579cc2d31c56fcfe4ce73aded1bd7d0/setup.py#L55 So I guess you got Kubernetes 10 installed in your virtualenv, which is incompatible with REANA-Cluster 0.5.0.

The easiest and fastest possible fix would downgrade Kubernetes version in your virtualenv. $ pip install pip install kubernetes==9.

jordidem commented 5 years ago

Dear Rokas,

Thanks for your quick answer. This solved my issue!

El dc., 17 jul. 2019, 19:43, Rokas Maciulaitis notifications@github.com va escriure:

Hi @jordidem https://github.com/jordidem,

REANA-Cluster 0.5.0 has no upper version limit for Kubernetes package - https://github.com/reanahub/reana-cluster/blob/14702db6d579cc2d31c56fcfe4ce73aded1bd7d0/setup.py#L55 So I guess you got Kubernetes 10 installed in your virtualenv, which is incompatible with REANA-Cluster 0.5.0.

The easiest and fastest possible fix would downgrade Kubernetes version in your virtualenv. $ pip install pip install kubernetes==9.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/reanahub/reana-cluster/issues/117?email_source=notifications&email_token=AI3F55HJ65XEUUDDIXEMY3LP75K5RA5CNFSM4FVE5UN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2F44RY#issuecomment-512478791, or mute the thread https://github.com/notifications/unsubscribe-auth/AI3F55HUCDZJOKXWCCGYWPDP75K5RANCNFSM4FVE5UNQ .

tiborsimko commented 5 years ago

@roksys @diegodelemos We should perhaps revive the topic of pinning all dependencies and use something like PyUp to help with a periodical upgrade schedule.

jordidem commented 5 years ago

Dear all,

Its Jordi again. we are still trying to make the deploy of reana on rancher and we are facing additional problems. This is our Rancher deploy image

This are the versions of the (picreana) [root@reana-server-test configurations]# kubectl version Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

These are our main problems to solve: 1) The environment reana-cluster env is returning None as URL (picreana) [root@reana-server-test configurations]# reana-cluster env --include-admin-token export REANA_SERVER_URL=http://None:80 export REANA_ACCESS_TOKEN=vetNHw...

Do you know what is happening?

2) But ok, we force the value of the REANA_SERVER_URL and continue with the process to start the execution of the example (myreana) :~/reana-client/run$ export REANA_SERVER_URL=http://192.168.96.xxx:31241/ (myreana) :~/reana-client/run$ export REANA_ACCESS_TOKEN=vetNHw....

(myreana) :~/reana-client/run$ reana-client ping Connected to http://192.168.96.xxx:31241/ - Server is running.

(myreana):~/reana-client/run$export REANA_WORKON=test1 (myreana):~/reana-client/run$reana-client create -n test1 test1.2 (myreana):~/reana-client/run$reana-client upload File code/helloworld.py was successfully uploaded. File data/names.txt was successfully uploaded. (myreana):~/reana-client/run$ reana-client start test1 is running (myreana):~/reana-client/run $ reana-client status NAME RUN_NUMBER CREATED STATUS PROGRESS test1 2 2019-07-25T14:19:16 running -/-
(myreana):~/reana-client/run $ reana-client download File results/greetings.txt could not be downloaded: results/greetings.txt does not exist.

The task is not running and we get this output: PROGRESS -/-, so when we try to download the output it is not in the workspace...We don't understand why the task it's not running, we couldn't find any hint in the logs

3) The last thing we are trying to understand is the initialization of the Ingress image So the initialization never ends

Thanks again for your help!

tiborsimko commented 5 years ago

(1) Regarding None value for URL detection, please see https://github.com/reanahub/reana-cluster/issues/73 there are some hints on how to look up the value if the current detection fails.

(2) For debugging running workflows, the best technique is to use kubectl get pods and kubectl logs on each pod to see what's happening. The status -/- means that the workflow status is unknown, perhaps the workflow has not started or perhaps the status cannot be updated. We saw it happening in the past when there were network connection issues between pods. Finally, you can also run reana-client ls -w mytest to inspect the workspace of the mytest workflow to see any created files there.

(3) Regarding ingress, you may want to check Ben's write-up https://bengalewsky.github.io/openstack/reana/hep/scailfin/2019/01/24/ZeroToReanaOnOpenstack.html containing musings about installing REANA on non-CERN infrastructure. There are parts touching ingress which may be perhaps useful for your scenario.

jordidem commented 5 years ago

One of the main issues we had deploying the Reana Cluster using Rancher was related to the mounting paths for the job pods in the context of using REANA with the local disk on a virtualized server that is hosting the reana-server.

The issue caused the failure of the job because the mount point and the root path for the user/workflow workspace was not properly set. After asking for some help in the gitter chat, we managed to solve this issue with the following workaround extracted from comment of gitlawr on rancher/rancher#14836 . Literally we did the following steps

1) Edit cluster, Edit as YAML 2) Add the following flags for kubelusing Rancher et:

services: kubelet: extra_args: containerized: "true" extra_binds:

3) Click save and wait till the cluster is updated.

Notes: The community is planning to deprecate the "--containerized" for kubelet(kubernetes/kubernetes#74148). But the flag is essential for the capability as there is no alternative at the moment.