Closed jlewi closed 6 years ago
It looks like issue might be with using the Cloud Endpoints proxy (which is based on nginx) to do JWT validation.
Google's HTTP Load Balancing product seems to have a 30s default timeout, which interacts badly with websockets (https://cloud.google.com/compute/docs/load-balancing/http/#timeouts_and_retries). Raising it to something closer to infinite (since you don't want to interrupt the websocket connection - although it will reconnect when it fails) is probably needed for a smooth experience. This is also why we use a LoadBalancer service directly when exposing the hub rather than ingress on GKE...
Thanks. So it could be the problem is either 1) the NGINX proxy or 2) Load balancer as configured by Ingress
I'll try using a load balancer with the side car and seeing if that works.
It seems like JupyterHub should work just fine behind NGINX; I think that's what the following describe
/cc @sveesible
(note that the first article is intensely outdated and uses forked versions of old versions of all software, so should not be considered very reliable).
But yes, JupyterHub should work just fine behind nginx, we recommend users use nginx-ingress if they need to use an ingress.
On Mon, Jan 8, 2018 at 2:49 PM, Jeremy Lewi notifications@github.com wrote:
It seems like JupyterHub should work just fine behind NGINX; I think that's what the following describe
- JupyterHub on GKE classrom https://github.com/GoogleCloudPlatform/gke-jupyter-classroom
- Jupyter Example http://jupyterhub.readthedocs.io/en/latest/reference/config-examples.html
/cc @sveesible https://github.com/sveesible
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/kubeflow/issues/104#issuecomment-356121695, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23rihzV2RUxBAvDCUIo-qNU-wekrSks5tIpt8gaJpZM4RWe7u .
-- Yuvi Panda T http://yuvi.in/blog
Thanks. Based on your previous comment though, it sound like you think using a service of type load balancer is better than using ingress. Is that right?
Yes, it has definitely been less error prone and simpler across all major cloud providers.
@yuvipanda Can you point me at some instructions? I tried using a LoadBalancer it looks like it ended up creating an external LB of type TCP and not HTTP. The backend service also showed up in the UI but not the command line and I couldn't figure out how to configure the timeout with the UI.
In the meantime with ingress, I was able to increase the timeout on the associated backend services but that didn't seem to help.
Looking at developer console I see a 501 error on websockets.
Request URL:wss://jupyterhub.endpoints.kubeflow-rl.cloud.goog/user/jlewi@google.com/api/kernels/b2b37895-3d91-4f75-8512-aa16242cc32a/channels?session_id=6F9B6302A1AF4D2BA4B28D7CE1B013AD
Request Method:GET
Status Code:501 Not Implemented
Response Headers
view source
Alt-Svc:clear
Connection:close
Content-Length:1550
Content-Type:text/html; charset=UTF-8
Referrer-Policy:no-referrer
Request Headers
view source
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.9
Cache-Control:no-cache
Connection:Upgrade
Cookie:user-jlewi%40google.com=2|1:0|10:1515455815|23:user-jlewi%40google.com|48:OGE5NTM3M2EtZWFlNS00ODUxLTk0NzgtNmExNDE4NWIyNTUx|3024d2e3e930ce0f56f35ebbd8cee3aaf825ecf8a323e398197f088d597e8153; _xsrf=2|f1a893d2|b2949fc9cb9d06d92d38e04c2386531b|1515354847
Host:jupyterhub.endpoints.kubeflow-rl.cloud.goog
Origin:https://jupyterhub.endpoints.kubeflow-rl.cloud.goog
Pragma:no-cache
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits
Sec-WebSocket-Key:I5ZhrDkPX4mNXI9aO+OAzg==
Sec-WebSocket-Version:13
Upgrade:websocket
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Query String Parameters
view source
view URL encoded
session_id:6F9B6302A1AF4D2BA4B28D7CE1B013AD
If you set service's type to LoadBalancer, it provisions a TCP LoadBalancer that pretty much 'just works', since HTTP/HTTPS just work over TCP. There are no timeouts enforced there, and so no configuration is needed. Check out the 'proxy-public' service in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/proxy/service.yaml for how we configure it.
Is this still with IAP? I've never used IAP, so unsure what could be causing that.
Thanks.
I've turned off IAP for now.
If I use a service of type LoadBalancer and direct traffic directly to JupyterHub everything works.
But if I now try to connect via the NGINX side car, I can access the Hub but I get a 404 when trying to create a new notebook:
Request URL:
Request Method:
POST
Status Code:
404 Not Found
Remote Address:
XX.XXX.XXX.XX:9000
Referrer Policy:
no-referrer-when-downgrade
So I think I figured out the issues with the NGINX config. I had to follow these [instructions]https://www.nginx.com/blog/websocket-nginx/) and insert a customized config to the endpoints container. My config is below. With this config I was able to connect through a load balancer provisioned by ingress pointing at the ESP proxy. Now I just need to enable IAP.
daemon off;
user nginx nginx;
pid /var/run/nginx.pid;
# Worker/connection processing limits
worker_processes 1;
worker_rlimit_nofile 10240;
events { worker_connections 10240; }
# Logging to stderr enables better integration with Docker and GKE/Kubernetes.
error_log stderr warn;
http {
include /etc/nginx/mime.types;
server_tokens off;
client_max_body_size 32m;
client_body_buffer_size 128k;
# HTTP subrequests
endpoints_resolver 8.8.8.8;
endpoints_certificates /etc/nginx/trusted-ca-certificates.crt;
upstream app_server0 {
server 127.0.0.1:8000;
keepalive 128;
}
set_real_ip_from 0.0.0.0/0;
set_real_ip_from 0::/0;
real_ip_header X-Forwarded-For;
real_ip_recursive on;
# top-level http config for websocket headers
# If Upgrade is defined, Connection = upgrade
# If Upgrade is empty, Connection = close
map_hash_max_size 262144;
map_hash_bucket_size 262144;
map $http_upgrade $connection_upgrade {
default upgrade;
"" close;
}
server {
server_name "";
listen 9000 backlog=16384;
access_log /dev/stdout;
location = /healthz {
return 200;
access_log off;
}
location / {
# Begin Endpoints v2 Support
endpoints {
on;
server_config /etc/nginx/server_config.pb.txt;
metadata_server http://169.254.169.254;
}
# End Endpoints v2 Support
proxy_pass http://app_server0;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
proxy_set_header X-Google-Real-IP $remote_addr;
# Enable the upstream persistent connection
proxy_http_version 1.1;
proxy_set_header Connection "";
# Enable websockets.
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# 86400 seconds (24 hours) is the maximum a server is allowed.
proxy_send_timeout 86400s;
proxy_read_timeout 86400s;
}
include /var/lib/nginx/extra/*.conf;
}
server {
# expose /nginx_status and /endpoints_status but on a different port to
# avoid external visibility / conflicts with the app.
listen 8090;
location /nginx_status {
stub_status on;
access_log off;
}
location /endpoints_status {
endpoints_status;
access_log off;
}
location /healthz {
return 200;
access_log off;
}
location / {
root /dev/null;
}
}
}
Here's the issue cloudendpoints/endpoints-tools#41 to add support for websockets in the cloud endpoints esp proxy.
IAP is working reliably these days.
On GKE with IAP, I can connect to JupyterHub and spawn a notebook server but Jupyter reports the error
The pod logs for my Jupyter notebooks are here jupyter_logs.txt
I see lots of warnings like
Which looks like jupyter/notebook#2664
Does anyone know if IAP and JupyterHub work?
/cc @foxish @yuvipanda