Closed a3626a closed 1 year ago
Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Thanks! I think enabling keep-alive makes sense, and exposing the timeout as an option is sensible as well. What I'm trying to understand is the addition of the keepaliveagent package instead of using the standard-library server.keepAliveTimeout. Can you speak to that as to why it's needed beyond setting keepAlive: true, keepAliveTimeout: 62000
?
I have done some experiments and concluded that keep alive should be supported for the both directions (client side - Load Balancer, and server side - Jupyter Hub or Jupyter Server)
But my experiment was not well organized to be shared. I haved used curl
to check keep-alive support, and nc
(Netcat) to verify timeout. I have found that without agent
keep-alive connections are closed after 5 seconds, even though I have set server.keepAliveTimeout
to 60 seconds.
I will do the experiment again, and share it here.
I have done simple experiment again.
I opened a shell inside the proxy pod which is deployed by Z2JH. Then I executed curl -v localhost:8000
.
--server-keep-alive-timeout=15000
&& --agent-free-socket-timeout=16000
Check the arguments using ps
/srv/configurable-http-proxy $ ps | grep node
1 nobody 0:03 node /srv/configurable-http-proxy/bin/configurable-http-proxy --ip= --api-ip= --api-port=8001 --default-target=http://jupyterhub1-hub:8081 --error-target=http://jupyterhub1-hub:8081/hub/error --port=8000 --log-level=debug --metrics-port=8080 --server-keep-alive-timeout=15000 --agent-free-socket-timeout=16000
/srv/configurable-http-proxy $ curl -v localhost:8000
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.1.2
> Accept: */*
>
< HTTP/1.1 302 Found
< server: TornadoServer/6.2
< content-type: text/html
< date: Thu, 17 Aug 2023 09:24:46 GMT
< access-control-allow-origin: *
< access-control-allow-methods: GET, POST, PUT, DELETE, OPTIONS
< content-security-policy: frame-ancestors self codle.io dev.codle.io
< x-jupyterhub-version: 3.0.0
< access-control-allow-headers: accept, content-type, authorization
< location: /hub/
< content-length: 0
< connection: keep-alive
<
* Connection #0 to host localhost left intact
-> left intact
means keep-alive works.
--server-keep-alive-timeout=15000
Check the arguments, too.
/srv/configurable-http-proxy $ ps | grep node
1 nobody 0:04 node /srv/configurable-http-proxy/bin/configurable-http-proxy --ip= --api-ip= --api-port=8001 --default-target=http://jupyterhub2-hub:8081 --error-target=http://jupyterhub2-hub:8081/hub/error --port=8000 --log-level=debug --metrics-port=8080 --server-keep-alive-timeout=15000
/srv/configurable-http-proxy $ curl -v localhost:8000
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.1.2
> Accept: */*
>
< HTTP/1.1 302 Found
< server: TornadoServer/6.2
< content-type: text/html
< date: Thu, 17 Aug 2023 09:24:38 GMT
< access-control-allow-origin: *
< access-control-allow-methods: GET, POST, PUT, DELETE, OPTIONS
< content-security-policy: frame-ancestors self codle.io dev.codle.io
< x-jupyterhub-version: 3.0.0
< access-control-allow-headers: accept, content-type, authorization
< location: /hub/
< content-length: 0
< connection: close
<
* Closing connection 0
No keep-alive.
Can you test with #492? It seems to enable keep-alive all the way through from proxied requests from tornado.
Actually, there seems to be something weird where we can't use a single agent for keep-alive on both http or https with the standard library (bizarre), so I think maybe this PR is the way to go.
Actually, there seems to be something weird where we can't use a single agent for keep-alive on both http or https with the standard library (bizarre), so I think maybe this PR is the way to go.
For the agentkeepalive
library, I followed this example. But there're no particular reason or cases that this library must be used. http.Agent
could work, I am not sure.
Can you test with https://github.com/jupyterhub/configurable-http-proxy/pull/492? It seems to enable keep-alive all the way through from proxied requests from tornado.
Ok. I will post curl
result and also nc
result. I think #492 will do keep-alive just for 5 seconds, won't respect the given timeout argument. Because timeout is not passed to the agent.
Also, I set up TLS termination on LB, so all my tests are done using HTTP.
https://github.com/jupyterhub/configurable-http-proxy/pull/492 has issue.
01:45:21.756 [ConfigProxy] info: Adding route / -> http://jupyterhub2-hub:8081
node:internal/validators:96
throw new ERR_INVALID_ARG_TYPE(name, 'number', value);
^
TypeError [ERR_INVALID_ARG_TYPE]: The "keepAliveTimeout" argument must be of type number. Received type string ('15000')
at Server.storeHTTPOptions (node:_http_server:464:5)
at new Server (node:_http_server:507:20)
at Object.createServer (node:http:61:10)
at new ConfigurableProxy (/srv/configurable-http-proxy/lib/configproxy.js:234:31)
at Object.<anonymous> (/srv/configurable-http-proxy/bin/configurable-http-proxy:320:13)
at Module._compile (node:internal/modules/cjs/loader:1254:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
at Module.load (node:internal/modules/cjs/loader:1117:32)
at Module._load (node:internal/modules/cjs/loader:958:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) {
code: 'ERR_INVALID_ARG_TYPE'
}
Node.js v18.16.0
Stream closed EOF for jupyter-hub/jupyterhub2-proxy-58b65b6b87-drnfn (chp)
I added parseInt
and test again. (I substituted 5000
to parseInt
)
ps result
/srv/configurable-http-proxy $ ps | grep node
1 nobody 0:01 node /srv/configurable-http-proxy/bin/configurable-http-proxy --ip= --api-ip= --api-port=8001 --default-target=http://jupyterhub2-hub:8081 --error-target=http://jupyterhub2-hub:8081/hub/error --port=8000 --log-level=debug --metrics-port=8080 --keep-alive-timeout=15000
/srv/configurable-http-proxy $ curl -v localhost:8000
* processing: localhost:8000
* Trying [::1]:8000...
* Connected to localhost (::1) port 8000
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.2.1
> Accept: */*
>
< HTTP/1.1 302 Found
< server: TornadoServer/6.2
< content-type: text/html
< date: Fri, 18 Aug 2023 02:20:21 GMT
< access-control-allow-origin: *
< access-control-allow-methods: GET, POST, PUT, DELETE, OPTIONS
< content-security-policy: frame-ancestors self codle.io dev.codle.io
< x-jupyterhub-version: 3.0.0
< access-control-allow-headers: accept, content-type, authorization
< location: /hub/
< content-length: 0
< connection: keep-alive
<
* Connection #0 to host localhost left intact
Keep alive works.
nc test is done manully, very naive.
CASE 1 : Timeout=15000, Request, Wait 10 seconds, Request again
ps result
/srv/configurable-http-proxy $ ps | grep node
1 nobody 0:01 node /srv/configurable-http-proxy/bin/configurable-http-proxy --ip= --api-ip= --api-port=8001 --default-target=http://jupyterhub2-hub:8081 --error-target=http://jupyterhub2-hub:8081/hub/error --port=8000 --log-level=debug --metrics-port=8080 --keep-alive-timeout=15000
/srv/configurable-http-proxy $ nc localhost 8000
GET / HTTP/1.1
HTTP/1.1 302 Found
server: TornadoServer/6.2
content-type: text/html
date: Fri, 18 Aug 2023 02:24:23 GMT
access-control-allow-origin: *
access-control-allow-methods: GET, POST, PUT, DELETE, OPTIONS
content-security-policy: frame-ancestors self codle.io dev.codle.io
x-jupyterhub-version: 3.0.0
access-control-allow-headers: accept, content-type, authorization
location: /hub/
content-length: 0
connection: keep-alive
< Wait 10 Seconds >
GET / HTTP/1.1
HTTP/1.1 302 Found
server: TornadoServer/6.2
content-type: text/html
date: Fri, 18 Aug 2023 02:24:34 GMT
access-control-allow-origin: *
access-control-allow-methods: GET, POST, PUT, DELETE, OPTIONS
content-security-policy: frame-ancestors self codle.io dev.codle.io
x-jupyterhub-version: 3.0.0
access-control-allow-headers: accept, content-type, authorization
location: /hub/
content-length: 0
connection: keep-alive
It should keep alive after 10 seconds, it actually does.
CASE 2 : Timeout=15000, Request, Wait 20 seconds, Request again
ps result
/srv/configurable-http-proxy $ ps | grep node
1 nobody 0:03 node /srv/configurable-http-proxy/bin/configurable-http-proxy --ip= --api-ip= --api-port=8001 --default-target=http://jupyterhub2-hub:8081 --error-target=http://jupyterhub2-hub:8081/hub/error --port=8000 --log-level=debug --metrics-port=8080 --keep-alive-timeout=15000
/srv/configurable-http-proxy $ nc localhost 8000
GET / HTTP/1.1
HTTP/1.1 302 Found
server: TornadoServer/6.2
content-type: text/html
date: Fri, 18 Aug 2023 02:25:55 GMT
access-control-allow-origin: *
access-control-allow-methods: GET, POST, PUT, DELETE, OPTIONS
content-security-policy: frame-ancestors self codle.io dev.codle.io
x-jupyterhub-version: 3.0.0
access-control-allow-headers: accept, content-type, authorization
location: /hub/
content-length: 0
connection: keep-alive
< Wait 20 Seconds >
GET / HTTP/1.1
< Connection Closed >
It should close connection after 20 seconds, it actually does.
I think #492 works with a parseInt
fix.
I thought #492 would not respect the given timeout. However, standard library http.Agent
seems like closing connections when the number of connections exceeds its limit. It does not close idle connections. So it works like infinite timeout when the number of active connection is below the limit.
I'm running Z2JH based service with about 1,000 DAU. It is deployed in AWS EKS attached to AWS ALB.
As DAU grows, users started to get 502 Responses from the LB.
This is well-known problem related to keep-alive setting. (AWS Article) Unfortunately, configurable-http-proxy does not support keep-alive. So I implemented, and tested in production environment.
After the deployment the number of 502 errors descreased.
Technical/Implementation detail
1) It is very important to allow keep-alive both client side and server side. That's why
Agent
andkeepAliveTimeout
are both needed.2) The jupyter hub and jupyter server support keep-alive by default, because they are Tornado servers.
3) chp is given these parameters. They are AWS specific values.