jupyterhub / helm-chart

A store of Helm chart tarballs for deploying JupyterHub and BinderHub on a Kubernetes cluster
https://jupyterhub.github.io/helm-chart/
184 stars 71 forks source link

Py notebook kernel failed to connect (JHub v0.8.2) #100

Closed wierzba3 closed 5 years ago

wierzba3 commented 5 years ago

I am encountering an issue with JHub 0.8.2 (I am also experiencing this issue with 0.7.0)

When I deploy my JupyterHub application to kubernetes (EKS on AWS, v1.13) via helm, everything deploys and starts fine. However, when i spawn a notebook server and create a python notebook, the kernel hangs when trying to connect. (See screenshots at the bottom)

I saw a similar issue posted here: https://github.com/jupyter/notebook/issues/2664 It seems there was a regression in tornado python package. However, I tried downgrading to 5.1.1 and that did not fix the issue...

What are the next troubleshooting steps I can try? Where can I find diagnostic info / logs for python kernel?

Screen Shot 2019-07-31 at 11 57 19 AM Screen Shot 2019-07-31 at 11 57 28 AM

wierzba3 commented 5 years ago

Update: one of our existing clusters that was running fine for about 2 months, started experiencing this kernel issue just today. This makes me wonder if this is some sort of regression, however how would this affect a jupyterhub deployment that has not been modified? Does jupyterhub update libraries/packages by itself, without consent?

wierzba3 commented 5 years ago

Another update: I inspected network traffic in browser, and discovered that the request to https://<<JUPYTERHUB_DOMAIN>>/user/me/api/kernels/<<KERNEL_ID>>/channels?session_id=<<SESSION_ID>> is returning HTTP 504 GATEWAY_TIMEOUT

Detailed HTTP request:

GET wss://<<MY_JHUB_DOMAIN>>/user/me/api/kernels/eaf397d3-36da-473c-8342-c4d4d3ad5256/channels?session_id=fa79dc80238648b8b1ea4c3982cb0612 HTTP/1.1
Host: <<MY_JHUB_DOMAIN>>
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Origin: https://<<MY_JHUB_DOMAIN>>
Sec-WebSocket-Version: 13
User-Agent: redacted
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cookie: redacted
Sec-WebSocket-Key: 3dthd3HV1uwI6NkNIsWVNA==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

Detailed HTTP response:

HTTP/1.1 504 GATEWAY_TIMEOUT
Content-Length: 0
Connection: keep-alive

data:undefined,
wierzba3 commented 5 years ago

The issue was that we switched the proxy-public ELB to listen on http instead of tcp and this broke the kernel endpoint since it uses web sockets