Closed ianabc closed 6 years ago
@ianabc I've been following a similar process to you:
master
branch with service.type: NodePort
(though I think ClusterIP
would also work for my setup, AFAIK I don't have access to LBaaS)If it's helpful I can put our configs and instructions somewhere?
Thanks for responding, @manics! The Ingress controller is what's missing here I think. Can you provide docs / put up your config somewhere?
Thanks for writing this up in a detailed fashion, @ianabc!
Thanks guys, I tried changing values.yaml locally then installing that chart,
diff --git a/values.yaml b/values.yaml
index 4616d94..0df43c6 100755
--- a/values.yaml
+++ b/values.yaml
@@ -42,7 +42,7 @@ rbac:
proxy:
secretToken: ''
service:
- type: LoadBalancer
+ type: NodePort
labels: {}
annotations: {}
nodePorts:
This gives me running services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hub ClusterIP 10.233.20.102 <none> 8081/TCP 9m
proxy-api ClusterIP 10.233.22.79 <none> 8001/TCP 9m
proxy-http ClusterIP 10.233.56.85 <none> 8000/TCP 9m
proxy-public NodePort 10.233.20.97 <none> 80:32561/TCP,443:30376/TCP 9m
but the hub pod stays in pending
NAME READY STATUS RESTARTS AGE
hub-6f87944b4f-8c22k 0/1 Pending 0 9m
proxy-866db8d69d-t2hw8 2/2 Running 0 9m
I get the same behaviour with ClusterIP in place of NodePort. For the hub pod I get
$ kubectl --namespace=j8s describe pods hub-6f87944b4f-8c22k
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 37s (x15 over 3m) default-scheduler PersistentVolumeClaim is not bound: "hub-db-dir" (repeated 3 times)
I set up the kubespray as simply as I could (...I didn't bother adding distributed storage), I'll try to reconfigure things with distributed storage and try again.
Yup, you need some persistent storage for the hub to store its database. Temporarily, you can disable that with:
hub:
db:
type: sqlite-memory
This should allow the hub pod to launch without persistent storage. But this can't run on production - the hub will lose all knowledge of users and spawns when it restarts...
Here's the config for our nginx ingress controller: https://github.com/openmicroscopy/kubernetes-platform/tree/19e8464a9a057a7b367d8a23a2a891f3c06de06d/ingress-nginx (there's some other details of the kubespray config in that repo too).
Currently our k8s cluster only has a single floating IP (attached to a master node) so the controller is deployed as a daemonset with a nodeSelector for master nodes only, and using hostNetwork to get access to ports 80 and 443.
Letsencrypt certs are autogenerated for the controller: helm upgrade --install -f CONFIG.yml kube-lego stable/kube-lego
zero-to-jupyterhub-config:
# https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub.html
# Local deployment from git checkout of https://github.com/jupyterhub/zero-to-jupyterhub-k8s.git
# Commit 7e9089b46c23fc1177c17cf1bef74d137cbba2ef (v0.4-580-g7e9089b)
hub:
baseUrl: /example/
cookieSecret: XXXXXXXX
db:
type: sqlite-memory
extraConfig: |
# https://github.com/jupyterhub/jupyterhub/wiki/Debug-Jupyterhub
c.JupyterHub.log_level = 'DEBUG'
c.Spawner.debug = True
auth:
dummy:
password: XXXXXXXX
proxy:
secretToken: XXXXXXXX
service:
type: NodePort
singleuser:
storage:
type: NONE
# Disable pre-puller, fails with rbac
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/118
prePuller:
enabled: false
ingress:
enabled: true
hosts:
- localhost
- jupyter.EXAMPLE.org
annotations:
kubernetes.io/ingress.class: nginx
ingress.kubernetes.io/proxy-body-size: 16m
ingress.kubernetes.io/proxy-read-timeout: 3600
ingress.kubernetes.io/proxy-send-timeout: 3600
kubernetes.io/tls-acme: 'true'
tls:
- hosts:
- "jupyter.EXAMPLE.org"
secretName: example-tls
The end result should be a jupyterhub instance running at https://jupyter.EXAMPLE.org/example/ The config is intended for one-day teaching sessions, so we just spin up a deployment using dummy auth with a fixed password, and delete the deployment afterwards. I'm also working on a more permanent deployment using GitHub or some other OAuth provider with this helm chart, instead of our own kubernetes manifests files which I wrote before I knew about this repo.
Remember to add a security group to give access on 80 and 443 and setup your external DNS before running all this, otherwise the letsencrypt certificate creation will fail since the .well-known/acme-challenge
URL won't be reachable.
This is a very recent deployment, but eventually this config should be in one of our organisation repos, @openmicroscopy or @ome . Or if you think it's useful I could try and add something to the docs in this repo?
That works to get the hub running, thank you! Something went wrong with my proxy though, the first two redirects seem to work, but when things get redirected to the hub it times out
curl -vL http://10.233.15.140
* About to connect() to 10.233.15.140 port 80 (#0)
* Trying 10.233.15.140...
* Connected to 10.233.15.140 (10.233.15.140) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.233.15.140
> Accept: */*
>
< HTTP/1.1 302 Found
< Server: nginx/1.13.7
< Date: Tue, 12 Dec 2017 19:19:56 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< x-jupyterhub-version: 0.8.1
< location: /jupyter/hub/
< content-security-policy: frame-ancestors 'self'; report-uri /jupyter/hub/security/csp-report
< Strict-Transport-Security: max-age=15724800; includeSubDomains;
<
* Connection #0 to host 10.233.15.140 left intact
* Issue another request to this URL: 'http://10.233.15.140/jupyter/hub/'
* Found bundle for host 10.233.15.140: 0x1ca0ef0
* Re-using existing connection! (#0) with host 10.233.15.140
* Connected to 10.233.15.140 (10.233.15.140) port 80 (#0)
> GET /jupyter/hub/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.233.15.140
> Accept: */*
>
< HTTP/1.1 302 Found
< Server: nginx/1.13.7
< Date: Tue, 12 Dec 2017 19:19:56 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< x-jupyterhub-version: 0.8.1
< location: /jupyter/hub/login
< content-security-policy: frame-ancestors 'self'; report-uri /jupyter/hub/security/csp-report
< Strict-Transport-Security: max-age=15724800; includeSubDomains;
<
* Connection #0 to host 10.233.15.140 left intact
* Issue another request to this URL: 'http://10.233.15.140/jupyter/hub/login'
* Found bundle for host 10.233.15.140: 0x1ca0ef0
* Re-using existing connection! (#0) with host 10.233.15.140
* Connected to 10.233.15.140 (10.233.15.140) port 80 (#0)
> GET /jupyter/hub/login HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.233.15.140
> Accept: */*
>
< HTTP/1.1 504 Gateway Time-out
< Server: nginx/1.13.7
< Date: Tue, 12 Dec 2017 19:20:56 GMT
< Content-Type: text/html
< Content-Length: 183
< Connection: keep-alive
< Strict-Transport-Security: max-age=15724800; includeSubDomains;
<
<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.13.7</center>
</body>
</html>
* Connection #0 to host 10.233.15.140 left intact
Digging around in the proxy logs, it looks like
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:18:54 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.019 [upstream-default-backend] 10.233.102.140:8000 0 0.019 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:02 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.024 [upstream-default-backend] 10.233.102.140:8000 0 0.024 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:18 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.019 [upstream-default-backend] 10.233.102.140:8000 0 0.019 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:53 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.018 [upstream-default-backend] 10.233.102.140:8000 0 0.018 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:53 +0000] "GET /jupyter/hub/ HTTP/1.1" 302 0 "-" "curl/7.29.0" 89 0.008 [upstream-default-backend] 10.233.102.140:8000 0 0.008 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:54 +0000] "GET /jupyter/hub/login HTTP/1.1" 499 0 "-" "curl/7.29.0" 94 1.307 [upstream-default-backend] 10.233.102.140:8000 0 - -
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:56 +0000] "GET / HTTP/1.1" 302 0 "-" "curl/7.29.0" 77 0.014 [upstream-default-backend] 10.233.102.140:8000 0 0.014 302
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:19:56 +0000] "GET /jupyter/hub/ HTTP/1.1" 302 0 "-" "curl/7.29.0" 89 0.011 [upstream-default-backend] 10.233.102.140:8000 0 0.011 302
2017/12/12 19:20:56 [error] 96#96: *39 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.104.64, server: _, request: "GET /jupyter/hub/login HTTP/1.1", upstream: "http://10.233.102.140:8000/jupyter/hub/login", host: "10.233.15.140"
10.233.104.64 - [10.233.104.64] - - [12/Dec/2017:19:20:56 +0000] "GET /jupyter/hub/login HTTP/1.1" 504 183 "-" "curl/7.29.0" 94 60.001 [upstream-default-backend] 10.233.102.140:8000 0 60.001 504
But the hub is on 10.233.104.88, not 10.233.104.64
kubectl --namespace=j8s get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
hub-78fb688b89-gh96q 1/1 Running 0 49m 10.233.104.88 master1
proxy-5d6cbd7b97-x6wct 2/2 Running 0 49m 10.233.102.140 node1
I can retrieve the login page from 10.233.104.88:8081 directly.
But the hub is on 10.233.104.88, not 10.233.104.64
Is 10.233.104.64 the service IP?
No, I don't think so,
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
hub ClusterIP 10.233.51.95 <none> 8081/TCP 21h name=hub
proxy-api ClusterIP 10.233.25.25 <none> 8001/TCP 21h component=proxy,name=proxy,release=j8s
proxy-http ClusterIP 10.233.16.119 <none> 8000/TCP 21h component=proxy,name=proxy,release=j8s
proxy-public NodePort 10.233.15.140 <none> 80:31765/TCP,443:32415/TCP 21h component=proxy,name=proxy,release=j8s
It seems to be the interface on the first node manager
$ ip addr show tunl0
16: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.233.104.64/32 scope global tunl0
valid_lft forever preferred_lft forever
I'm going to be away for a few days, but when I get back I'll take a closer look and try to find my mistake. Thanks again for your help.
OK. What network plugin did you configure in kubespray? I'm using flannel, the default is calico but that needs additional config on openstack which I couldn't get to work.
It's calico, so that might be the problem. I had to have the openstack admins add a gateway to their external network, but other than that, it seems to be mostly working. I'll start picking things apart when I get back.
I found https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#openstack which might be useful?
Thanks @ianabc and others for the rich discussion here.
I'm going to go ahead and close this issue. I've added a link in the Zero to JupyterHub wiki's Resource section to this issue's discussion.
N.B. I don't think this is an issue with zero-to-jupyterhub-k8s specifically, it's more likely to be a problem between my openstack and kubernetes installations, but I'm adding my experience here in case anyone has any insight.
I've been trying to get JupyterHub going on Kubernetes, I was able to get things going on GCE just by following the documentation, but we have an allocation on an OpenStack installation which I'd like to use in place of GCE. I found that most things I wanted to do just worked, but there was some issue getting an external IP for the public-proxy LoadBalancer.
Here's what I've been doing...
Created a network + subnet in OpenStack. I had to do this as a separate step because my provider doesn't let me add a gateway to their external network (it eats up another of a small pool of public IPs).
Created a kubernetes cluster (via kubespray + some terraform) - I then messed around creating deployments for a while to make sure things were working.
Followed the Zero-to-Juptyerhub docs up until the
helm install
I found that I had to use Yuvi's 0.5.0 version because there was some problem with kubernetes RBAC and the v0.4 version. The install ran successfully (I think) up until the last service definition. It stays in the pending state
I took a look at https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/ and as far as I can tell the problem will be with kubernetes dependence on openstack to create that load balancer. My guess is that I've either missed something in the kubernetes setup or my OpenStack provider doesn't trust me enough :-)