kubeflow / kubeflow

Machine Learning Toolkit for Kubernetes
https://www.kubeflow.org/
Apache License 2.0
14.36k stars 2.41k forks source link

Problems connecting to Jupyter Kernel with IAP #104

Closed jlewi closed 6 years ago

jlewi commented 6 years ago

On GKE with IAP, I can connect to JupyterHub and spawn a notebook server but Jupyter reports the error

A connection to the notebook server could not be established. The notebook will continue trying to reconnect. Check your network connection or notebook server configuration.

The pod logs for my Jupyter notebooks are here jupyter_logs.txt

I see lots of warnings like

[W 2018-01-08 14:47:15.823 SingleUserNotebookApp handlers:257] Replacing stale connection: bef70c27-c773-4388-974c-60a9e40341ba:E764B71A66324E70A6F3D020ABAF0637

Which looks like jupyter/notebook#2664

Does anyone know if IAP and JupyterHub work?

/cc @foxish @yuvipanda

jlewi commented 6 years ago

It looks like issue might be with using the Cloud Endpoints proxy (which is based on nginx) to do JWT validation.

yuvipanda commented 6 years ago

Google's HTTP Load Balancing product seems to have a 30s default timeout, which interacts badly with websockets (https://cloud.google.com/compute/docs/load-balancing/http/#timeouts_and_retries). Raising it to something closer to infinite (since you don't want to interrupt the websocket connection - although it will reconnect when it fails) is probably needed for a smooth experience. This is also why we use a LoadBalancer service directly when exposing the hub rather than ingress on GKE...

jlewi commented 6 years ago

Thanks. So it could be the problem is either 1) the NGINX proxy or 2) Load balancer as configured by Ingress

I'll try using a load balancer with the side car and seeing if that works.

jlewi commented 6 years ago

It seems like JupyterHub should work just fine behind NGINX; I think that's what the following describe

/cc @sveesible

yuvipanda commented 6 years ago

(note that the first article is intensely outdated and uses forked versions of old versions of all software, so should not be considered very reliable).

But yes, JupyterHub should work just fine behind nginx, we recommend users use nginx-ingress if they need to use an ingress.

On Mon, Jan 8, 2018 at 2:49 PM, Jeremy Lewi notifications@github.com wrote:

It seems like JupyterHub should work just fine behind NGINX; I think that's what the following describe

/cc @sveesible https://github.com/sveesible

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/kubeflow/issues/104#issuecomment-356121695, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23rihzV2RUxBAvDCUIo-qNU-wekrSks5tIpt8gaJpZM4RWe7u .

-- Yuvi Panda T http://yuvi.in/blog

jlewi commented 6 years ago

Thanks. Based on your previous comment though, it sound like you think using a service of type load balancer is better than using ingress. Is that right?

yuvipanda commented 6 years ago

Yes, it has definitely been less error prone and simpler across all major cloud providers.

jlewi commented 6 years ago

@yuvipanda Can you point me at some instructions? I tried using a LoadBalancer it looks like it ended up creating an external LB of type TCP and not HTTP. The backend service also showed up in the UI but not the command line and I couldn't figure out how to configure the timeout with the UI.

In the meantime with ingress, I was able to increase the timeout on the associated backend services but that didn't seem to help.

Looking at developer console I see a 501 error on websockets.

Request URL:wss://jupyterhub.endpoints.kubeflow-rl.cloud.goog/user/jlewi@google.com/api/kernels/b2b37895-3d91-4f75-8512-aa16242cc32a/channels?session_id=6F9B6302A1AF4D2BA4B28D7CE1B013AD
Request Method:GET
Status Code:501 Not Implemented
Response Headers
view source
Alt-Svc:clear
Connection:close
Content-Length:1550
Content-Type:text/html; charset=UTF-8
Referrer-Policy:no-referrer
Request Headers
view source
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.9
Cache-Control:no-cache
Connection:Upgrade
Cookie:user-jlewi%40google.com=2|1:0|10:1515455815|23:user-jlewi%40google.com|48:OGE5NTM3M2EtZWFlNS00ODUxLTk0NzgtNmExNDE4NWIyNTUx|3024d2e3e930ce0f56f35ebbd8cee3aaf825ecf8a323e398197f088d597e8153; _xsrf=2|f1a893d2|b2949fc9cb9d06d92d38e04c2386531b|1515354847
Host:jupyterhub.endpoints.kubeflow-rl.cloud.goog
Origin:https://jupyterhub.endpoints.kubeflow-rl.cloud.goog
Pragma:no-cache
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits
Sec-WebSocket-Key:I5ZhrDkPX4mNXI9aO+OAzg==
Sec-WebSocket-Version:13
Upgrade:websocket
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Query String Parameters
view source
view URL encoded
session_id:6F9B6302A1AF4D2BA4B28D7CE1B013AD
yuvipanda commented 6 years ago

If you set service's type to LoadBalancer, it provisions a TCP LoadBalancer that pretty much 'just works', since HTTP/HTTPS just work over TCP. There are no timeouts enforced there, and so no configuration is needed. Check out the 'proxy-public' service in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/proxy/service.yaml for how we configure it.

Is this still with IAP? I've never used IAP, so unsure what could be causing that.

jlewi commented 6 years ago

Thanks.

I've turned off IAP for now.

jlewi commented 6 years ago

If I use a service of type LoadBalancer and direct traffic directly to JupyterHub everything works.

But if I now try to connect via the NGINX side car, I can access the Hub but I get a 404 when trying to create a new notebook:

Request URL:

Request Method:
POST
Status Code:
404 Not Found
Remote Address:
XX.XXX.XXX.XX:9000
Referrer Policy:
no-referrer-when-downgrade
jlewi commented 6 years ago

So I think I figured out the issues with the NGINX config. I had to follow these [instructions]https://www.nginx.com/blog/websocket-nginx/) and insert a customized config to the endpoints container. My config is below. With this config I was able to connect through a load balancer provisioned by ingress pointing at the ESP proxy. Now I just need to enable IAP.

daemon off;

user nginx nginx;

pid /var/run/nginx.pid;

# Worker/connection processing limits
worker_processes 1;
worker_rlimit_nofile 10240;
events { worker_connections 10240; }

# Logging to stderr enables better integration with Docker and GKE/Kubernetes.
error_log stderr warn;

http {
  include /etc/nginx/mime.types;
  server_tokens off;
  client_max_body_size 32m;
  client_body_buffer_size 128k;

  # HTTP subrequests
  endpoints_resolver 8.8.8.8;
  endpoints_certificates /etc/nginx/trusted-ca-certificates.crt;

  upstream app_server0 {
    server 127.0.0.1:8000;
    keepalive 128;
  }

  set_real_ip_from  0.0.0.0/0;
  set_real_ip_from  0::/0;
  real_ip_header    X-Forwarded-For;
  real_ip_recursive on;

  # top-level http config for websocket headers
  # If Upgrade is defined, Connection = upgrade
  # If Upgrade is empty, Connection = close

  map_hash_max_size 262144;
  map_hash_bucket_size 262144;

  map $http_upgrade $connection_upgrade {
    default upgrade;
    "" close;
  }

  server {
    server_name "";

    listen 9000 backlog=16384;

    access_log /dev/stdout;

    location = /healthz {
      return 200;
      access_log off;
    }

    location / {
      # Begin Endpoints v2 Support
      endpoints {
        on;
        server_config /etc/nginx/server_config.pb.txt;
        metadata_server http://169.254.169.254;
      }
      # End Endpoints v2 Support

      proxy_pass http://app_server0;
      proxy_redirect off;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Host $server_name;
      proxy_set_header X-Google-Real-IP $remote_addr;

      # Enable the upstream persistent connection
      proxy_http_version 1.1;
      proxy_set_header Connection "";

      # Enable websockets.
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;

      # 86400 seconds (24 hours) is the maximum a server is allowed.
      proxy_send_timeout 86400s;
      proxy_read_timeout 86400s;
    }

    include /var/lib/nginx/extra/*.conf;
  }

  server {
    # expose /nginx_status and /endpoints_status but on a different port to
    # avoid external visibility / conflicts with the app.
    listen 8090;
    location /nginx_status {
      stub_status on;
      access_log off;
    }
    location /endpoints_status {
      endpoints_status;
      access_log off;
    }
    location /healthz {
      return 200;
      access_log off;
    }
    location / {
      root /dev/null;
    }
  }
}
jlewi commented 6 years ago

Here's the issue cloudendpoints/endpoints-tools#41 to add support for websockets in the cloud endpoints esp proxy.

jlewi commented 6 years ago

IAP is working reliably these days.