jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.53k stars 792 forks source link

Openstack LoadBalancer #340

Closed ianabc closed 6 years ago

ianabc commented 6 years ago

N.B. I don't think this is an issue with zero-to-jupyterhub-k8s specifically, it's more likely to be a problem between my openstack and kubernetes installations, but I'm adding my experience here in case anyone has any insight.

I've been trying to get JupyterHub going on Kubernetes, I was able to get things going on GCE just by following the documentation, but we have an allocation on an OpenStack installation which I'd like to use in place of GCE. I found that most things I wanted to do just worked, but there was some issue getting an external IP for the public-proxy LoadBalancer.

Here's what I've been doing...

  1. Created a network + subnet in OpenStack. I had to do this as a separate step because my provider doesn't let me add a gateway to their external network (it eats up another of a small pool of public IPs).

  2. Created a kubernetes cluster (via kubespray + some terraform) - I then messed around creating deployments for a while to make sure things were working.

  3. Followed the Zero-to-Juptyerhub docs up until the helm install

I found that I had to use Yuvi's 0.5.0 version because there was some problem with kubernetes RBAC and the v0.4 version. The install ran successfully (I think) up until the last service definition. It stays in the pending state

[ptty2u@manager1 ~]$ kubectl --namespace=j8s get svc
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
hub            ClusterIP      10.233.58.67    <none>        8081/TCP                     10m
proxy-api      ClusterIP      10.233.5.62     <none>        8001/TCP                     10m
proxy-http     ClusterIP      10.233.35.250   <none>        8000/TCP                     10m
proxy-public   LoadBalancer   10.233.7.251    <pending>     80:31271/TCP,443:30600/TCP   10m

I took a look at https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/ and as far as I can tell the problem will be with kubernetes dependence on openstack to create that load balancer. My guess is that I've either missed something in the kubernetes setup or my OpenStack provider doesn't trust me enough :-)

manics commented 6 years ago

@ianabc I've been following a similar process to you:

If it's helpful I can put our configs and instructions somewhere?

yuvipanda commented 6 years ago

Thanks for responding, @manics! The Ingress controller is what's missing here I think. Can you provide docs / put up your config somewhere?

Thanks for writing this up in a detailed fashion, @ianabc!

ianabc commented 6 years ago

Thanks guys, I tried changing values.yaml locally then installing that chart,

diff --git a/values.yaml b/values.yaml
index 4616d94..0df43c6 100755
--- a/values.yaml
+++ b/values.yaml
@@ -42,7 +42,7 @@ rbac:
 proxy:
   secretToken: ''
   service:
-    type: LoadBalancer
+    type: NodePort
     labels: {}
     annotations: {}
     nodePorts:

This gives me running services

NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
hub            ClusterIP   10.233.20.102   <none>        8081/TCP                     9m
proxy-api      ClusterIP   10.233.22.79    <none>        8001/TCP                     9m
proxy-http     ClusterIP   10.233.56.85    <none>        8000/TCP                     9m
proxy-public   NodePort    10.233.20.97    <none>        80:32561/TCP,443:30376/TCP   9m

but the hub pod stays in pending

NAME                     READY     STATUS    RESTARTS   AGE
hub-6f87944b4f-8c22k     0/1       Pending   0          9m
proxy-866db8d69d-t2hw8   2/2       Running   0          9m

I get the same behaviour with ClusterIP in place of NodePort. For the hub pod I get

$ kubectl --namespace=j8s describe pods hub-6f87944b4f-8c22k
...
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  37s (x15 over 3m)  default-scheduler  PersistentVolumeClaim is not bound: "hub-db-dir" (repeated 3 times)

I set up the kubespray as simply as I could (...I didn't bother adding distributed storage), I'll try to reconfigure things with distributed storage and try again.

yuvipanda commented 6 years ago

Yup, you need some persistent storage for the hub to store its database. Temporarily, you can disable that with:

hub:
  db: 
    type: sqlite-memory

This should allow the hub pod to launch without persistent storage. But this can't run on production - the hub will lose all knowledge of users and spawns when it restarts...

manics commented 6 years ago

Here's the config for our nginx ingress controller: https://github.com/openmicroscopy/kubernetes-platform/tree/19e8464a9a057a7b367d8a23a2a891f3c06de06d/ingress-nginx (there's some other details of the kubespray config in that repo too).

Currently our k8s cluster only has a single floating IP (attached to a master node) so the controller is deployed as a daemonset with a nodeSelector for master nodes only, and using hostNetwork to get access to ports 80 and 443.

Letsencrypt certs are autogenerated for the controller: helm upgrade --install -f CONFIG.yml kube-lego stable/kube-lego

zero-to-jupyterhub-config:

# https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub.html

# Local deployment from git checkout of https://github.com/jupyterhub/zero-to-jupyterhub-k8s.git
# Commit 7e9089b46c23fc1177c17cf1bef74d137cbba2ef (v0.4-580-g7e9089b)

hub:
  baseUrl: /example/
  cookieSecret: XXXXXXXX
  db:
    type: sqlite-memory
  extraConfig: |
    # https://github.com/jupyterhub/jupyterhub/wiki/Debug-Jupyterhub
    c.JupyterHub.log_level = 'DEBUG'
    c.Spawner.debug = True

auth:
  dummy:
    password: XXXXXXXX

proxy:
  secretToken: XXXXXXXX
  service:
    type: NodePort

singleuser:
  storage:
    type: NONE

# Disable pre-puller, fails with rbac
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/118
prePuller:
  enabled: false

ingress:
  enabled: true
  hosts:
  - localhost
  - jupyter.EXAMPLE.org
  annotations:
    kubernetes.io/ingress.class: nginx
    ingress.kubernetes.io/proxy-body-size: 16m
    ingress.kubernetes.io/proxy-read-timeout: 3600
    ingress.kubernetes.io/proxy-send-timeout: 3600
    kubernetes.io/tls-acme: 'true'
  tls:
  - hosts:
    - "jupyter.EXAMPLE.org"
    secretName: example-tls

The end result should be a jupyterhub instance running at https://jupyter.EXAMPLE.org/example/ The config is intended for one-day teaching sessions, so we just spin up a deployment using dummy auth with a fixed password, and delete the deployment afterwards. I'm also working on a more permanent deployment using GitHub or some other OAuth provider with this helm chart, instead of our own kubernetes manifests files which I wrote before I knew about this repo.

Remember to add a security group to give access on 80 and 443 and setup your external DNS before running all this, otherwise the letsencrypt certificate creation will fail since the .well-known/acme-challenge URL won't be reachable.

This is a very recent deployment, but eventually this config should be in one of our organisation repos, @openmicroscopy or @ome . Or if you think it's useful I could try and add something to the docs in this repo?

ianabc commented 6 years ago

That works to get the hub running, thank you! Something went wrong with my proxy though, the first two redirects seem to work, but when things get redirected to the hub it times out

curl -vL http://10.233.15.140
* About to connect() to 10.233.15.140 port 80 (#0)
*   Trying 10.233.15.140...
* Connected to 10.233.15.140 (10.233.15.140) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.233.15.140
> Accept: */*
> 
< HTTP/1.1 302 Found
< Server: nginx/1.13.7
< Date: Tue, 12 Dec 2017 19:19:56 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< x-jupyterhub-version: 0.8.1
< location: /jupyter/hub/
< content-security-policy: frame-ancestors 'self'; report-uri /jupyter/hub/security/csp-report
< Strict-Transport-Security: max-age=15724800; includeSubDomains;
< 
* Connection #0 to host 10.233.15.140 left intact
* Issue another request to this URL: 'http://10.233.15.140/jupyter/hub/'
* Found bundle for host 10.233.15.140: 0x1ca0ef0
* Re-using existing connection! (#0) with host 10.233.15.140
* Connected to 10.233.15.140 (10.233.15.140) port 80 (#0)
> GET /jupyter/hub/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.233.15.140
> Accept: */*
> 
< HTTP/1.1 302 Found
< Server: nginx/1.13.7
< Date: Tue, 12 Dec 2017 19:19:56 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< x-jupyterhub-version: 0.8.1
< location: /jupyter/hub/login
< content-security-policy: frame-ancestors 'self'; report-uri /jupyter/hub/security/csp-report
< Strict-Transport-Security: max-age=15724800; includeSubDomains;
< 
* Connection #0 to host 10.233.15.140 left intact
* Issue another request to this URL: 'http://10.233.15.140/jupyter/hub/login'
* Found bundle for host 10.233.15.140: 0x1ca0ef0
* Re-using existing connection! (#0) with host 10.233.15.140
* Connected to 10.233.15.140 (10.233.15.140) port 80 (#0)
> GET /jupyter/hub/login HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.233.15.140
> Accept: */*
> 
< HTTP/1.1 504 Gateway Time-out
< Server: nginx/1.13.7
< Date: Tue, 12 Dec 2017 19:20:56 GMT
< Content-Type: text/html
< Content-Length: 183
< Connection: keep-alive
< Strict-Transport-Security: max-age=15724800; includeSubDomains;
< 
<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.13.7</center>
</body>
</html>
* Connection #0 to host 10.233.15.140 left intact

Digging around in the proxy logs, it looks like

10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:18:54 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.019 [upstream-default-backend] 10.233.102.140:8000 0 0.019 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:02 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.024 [upstream-default-backend] 10.233.102.140:8000 0 0.024 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:18 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.019 [upstream-default-backend] 10.233.102.140:8000 0 0.019 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:53 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.018 [upstream-default-backend] 10.233.102.140:8000 0 0.018 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:53 +0000] "GET /jupyter/hub/ HTTP/1.1" 302 0 "-" "curl/7.29.0" 89 0.008 [upstream-default-backend] 10.233.102.140:8000 0 0.008 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:54 +0000] "GET /jupyter/hub/login HTTP/1.1" 499 0 "-" "curl/7.29.0" 94 1.307 [upstream-default-backend] 10.233.102.140:8000 0 - -
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:56 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.014 [upstream-default-backend] 10.233.102.140:8000 0 0.014 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:56 +0000] "GET /jupyter/hub/ HTTP/1.1" 302 0 "-" "curl/7.29.0" 89 0.011 [upstream-default-backend] 10.233.102.140:8000 0 0.011 302
2017/12/12 19:20:56 [error] 96#96: *39 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.104.64, server: _, request: "GET /jupyter/hub/login HTTP/1.1", upstream: "http://10.233.102.140:8000/jupyter/hub/login", host: "10.233.15.140"
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:20:56 +0000] "GET /jupyter/hub/login HTTP/1.1" 504 183 "-" "curl/7.29.0" 94 60.001 [upstream-default-backend] 10.233.102.140:8000 0 60.001 504

But the hub is on 10.233.104.88, not 10.233.104.64

kubectl --namespace=j8s get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP               NODE
hub-78fb688b89-gh96q     1/1       Running   0          49m       10.233.104.88    master1
proxy-5d6cbd7b97-x6wct   2/2       Running   0          49m       10.233.102.140   node1

I can retrieve the login page from 10.233.104.88:8081 directly.

manics commented 6 years ago

But the hub is on 10.233.104.88, not 10.233.104.64

Is 10.233.104.64 the service IP?

ianabc commented 6 years ago

No, I don't think so,

NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE       SELECTOR
hub            ClusterIP   10.233.51.95    <none>        8081/TCP                     21h       name=hub
proxy-api      ClusterIP   10.233.25.25    <none>        8001/TCP                     21h       component=proxy,name=proxy,release=j8s
proxy-http     ClusterIP   10.233.16.119   <none>        8000/TCP                     21h       component=proxy,name=proxy,release=j8s
proxy-public   NodePort    10.233.15.140   <none>        80:31765/TCP,443:32415/TCP   21h       component=proxy,name=proxy,release=j8s

It seems to be the interface on the first node manager

 $ ip addr show tunl0
16: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.233.104.64/32 scope global tunl0
       valid_lft forever preferred_lft forever

I'm going to be away for a few days, but when I get back I'll take a closer look and try to find my mistake. Thanks again for your help.

manics commented 6 years ago

OK. What network plugin did you configure in kubespray? I'm using flannel, the default is calico but that needs additional config on openstack which I couldn't get to work.

ianabc commented 6 years ago

It's calico, so that might be the problem. I had to have the openstack admins add a gateway to their external network, but other than that, it seems to be mostly working. I'll start picking things apart when I get back.

yuvipanda commented 6 years ago

I found https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#openstack which might be useful?

willingc commented 6 years ago

Thanks @ianabc and others for the rich discussion here.

I'm going to go ahead and close this issue. I've added a link in the Zero to JupyterHub wiki's Resource section to this issue's discussion.