Open nono303 opened 6 months ago
I am a bit confused. Does the browser make WebSocket connections over HTTP/2? It should not, if the server does not support it. Is this another request that produces the infinite loop? Strange.
Can you produce a log with LogLevel http2:trace2
on such a situation?
I am a bit confused
same for me cause my big picture understanding is that in this situation, h2_switch.c
would have to decline and forward to mod_proxy_wstunnel....
Does the browser make WebSocket connections over HTTP/2
no, over HTTP/1.1
2024-04-02 17:52:06.551 proxy-server "GET /grafana/api/live/ws HTTP/1.1" 101 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0" sI:2676 sO:->0:-:- sT:3657 tI:- tO:24711096 ka:0-
Can you produce a log with LogLevel http2:trace2 on such a situation?
just have this relevant:
2024-04-02 17:52:06.551648 grafana.mydomain.com 127.0.0.1:50694 ZgwppqjK1zgVxTggMqo8iAAAAMM http2:debug h2_switch.c(92) pid:13140 tid:4048 ka:0 [AH03085: upgrade without HTTP2-Settings declined]
full trace after a graceful restart: http2-trace2.txt
Hi @icing, Continuing digging on it…
To be clear:
But as you said, I’m not sure that the infinite loop is originated by a gracefully restarted websocket cnx! As server-status scoreboard doesn’t show tid in mpm_winnt mode (only pid of main process) and as I don’t see issue in scoreboard nor broken cnx in browser network call stack, I suspect to have some old connection (from previous httpd forked process before graceful restart) to be stuck
brower <> httpd
httpd <> [grafana|mattermost|gitlab]
As it’s difficult to work on Windows with tracing on a “kill & up process fork” and as I’m not fluent with APR, would it be possible to give a tmp patch (better than mine... easy ;) to trace which request are stuck in the infinite loop? The idea is to quickly have maximum of information (tid, request uri rtime, etc.) when it's stuck in do while greater than n time (like 1000 or whatever) and exit.
If mod_http2 is indeed polling in a loop, as your stacktrace indicates, something weird is going on. The trace is a call from h2_session.c:1911
where the session is in state H2_SESSION_ST_BUSY
. When the poll returns, it calls h2_c1_read(session)
. That call should either get data from the client or move the session to another state.
I think we really need a log from such a case with LogLevel http2:debug
to see what state changes the HTTP/2 session does.
Hi @icing,I'm back on it
Yesterday I put the http2:debug
logs and reactivated the configuration Protocols h2 http/1.1 acme-tls/1
Surprise! No infinite loop has been triggered in the last 12 hours (and the usage /throughput is globally the same)
Originally, just had to wait few second after a graceful to see 1 to 4 thread in infinite loop stae regarding cpu usage
Changes since last month:
I'll keep you posted...
Thanks for the update. This is a mysterious one indeed.
Hi @icing, I think I've caught the loop Use case:
[AH03079: h2_session(10292-0,INIT,0): started on mattermost:443]
[AH03079: h2_session(10292-1,INIT,0): started on webmail:443]
[AH03079: h2_session(10292-2,INIT,0): started on grafana:443]
[AH03079: h2_session(10292-3,INIT,0): started on mainsite:443]
[AH03079: h2_session(10292-5,INIT,0): started on mattermost:443]
[AH03079: h2_session(10292-6,INIT,0): started on mattermost:443]
[AH03079: h2_session(10292-7,INIT,0): started on grafana:443]
[AH03079: h2_session(10292-8,INIT,0): started on mainsite:443]
mattermost & mrafana proxy config
RequestHeader set X-Forwarded-Proto "https" RequestHeader set X-Forwarded-Ssl "on" ProxyPass / http://backend/ upgrade=websocket retry=0 keepalive=On timeout=10 ProxyPassReverse / http://backend /
[httpd.exe] TCP httpd.ip:59709 mattermost.ip:8065 CLOSE_WAIT
[httpd.exe] TCP httpd.ip:59765 mattermost.ip:8065 CLOSE_WAIT
[httpd.exe] TCP httpd.ip:59795 mattermost.ip:8065 CLOSE_WAIT
[httpd.exe] TCP httpd.ip:59798 mattermost.ip:8065 CLOSE_WAIT
[httpd.exe] TCP httpd.ip:59799 mattermost.ip:8065 CLOSE_WAIT
[httpd.exe] TCP httpd.ip:59802 mattermost.ip:8065 CLOSE_WAIT
Here is the full 2 minutes httpd-error.log
Hard to read for me but... We see the 4 websocket connection declined [AH03085: upgrade without HTTP2-Settings declined]
So hypothesis that I don't control... Is there any chance that a pointer of these declined connections is still referenced in the h2_session and produce a "busy" zombie
I say this after reading the lines below which correspond almost to the moment when the infinite loop ended
2024-05-28 08:02:22.967261 - client_ip:60500 - http2:debug h2_session.c(1393) pid:10292 tid:3784 ka:0 [AH03078: h2_session(10292-5,BUSY,0): transit [BUSY] -- input exhausted, no streams --> [IDLE]]
2024-05-28 08:02:23.429743 - client_ip:60500 - http2:debug h2_session.c(1393) pid:10292 tid:3784 ka:0 [AH03078: h2_session(10292-5,BUSY,0): transit [BUSY] -- input exhausted, no streams --> [IDLE]]
2024-05-28 08:02:23.451635 - client_ip:60500 - http2:debug h2_session.c(1393) pid:10292 tid:3784 ka:0 [AH03078: h2_session(10292-5,BUSY,0): transit [BUSY] -- input exhausted, no streams --> [IDLE]]
2024-05-28 08:02:23.527443 - client_ip:60500 - http2:debug h2_session.c(1393) pid:10292 tid:3784 ka:0 [AH03078: h2_session(10292-5,BUSY,0): transit [BUSY] -- input exhausted, no streams --> [IDLE]]
That's the context. If you need more traces, information... just tell me
Hi @icing I have a bug with mod_h2 and "proxyfied" websockets when restarting the service (graceful I suppose) on Windows This problem reminds me of https://bz.apache.org/bugzilla/show_bug.cgi?id=65180 but is not fully identical.
httpd 2.4.58 self compiled vs17 x64 mod_h2 2.0.26
I have 3 vhosts configured (grafana, gitlab and mattermost) like this (with different ports...)
I start httpd and launch my browser tabs which requires a websocket connection
101 Switching Protocols
for all and I see the exchanges in the consoleI restart httpd (httpd -k restart) while still having my 3 tabs open (websocket connected)
I attach debugger and find an infinite loop on the 3 threads on this stack:
...and to be clear about it...
indeed, log show an infinite loop after graceful restart..
So
did I configure something wrong? (If you need other information about my configuration, I will give it to you)
if we need to investigate further, tell me what to do
⚠️ by starting (httpd -k start) with only changing to
H2Upgrade Off
, I directly fall into the infinite loop