Closed codeyy closed 1 year ago
Thanks for reporting these issues here. I would be very much interested in hearing if there was any re-appearance after going back to 1.15.x. The new implementation might be exactly what's causing the problem, but, as you stated, it is also the first suspect.
You are running on which host OS, Apache httpd base version and MPM?
In 2.4.52, following issue has been reported which might be related: https://bz.apache.org/bugzilla/show_bug.cgi?id=65626
Host OS: CentOS 7 Server Version: Apache/2.4.51 (codeit) OpenSSL/1.1.1l mod_perl/2.0.11 Perl/v5.16.3 Server MPM: event
Before going to mod_http2 2.0.0 is was running 1.15.24-2.codeit.x86_64, the downgrade now took me to 1.15.25-2.codeit.
They are all running for about 3h on that version now, and the servers with the most issues before are still all running smooth. Yesterday evening waiting for 30mins was enough to see hanging processes already, so it's looking good so far.
I'll read through the Apache thread somewhere later today. Thanks a lot for the quick response, I'll do my best to assist in tracking down the issue.
FYI: I am working on improving the information available on Apache's scoreboard.
In general, the "SS" values will just climb and climb on worker slots that receive no new requests. The "Dur" seems way more interesting and 2.0.0 did not update that correctly. I'll work on that.
I plan to add the last request a H2 connection has been worked on in the "Request" column. Even though, they can be many interleaved, when a request in H2 stalls, this should still help find the culprit.
Any feedback or suggestion you might have regarding scoreboard (server-status) improvements, I'd be more than happy to take in. You will have much more experience with this than I have.
@codeyy I just published release v2.0.1. I found a bug that led to an infinite loop when the client closed the connection. That would perfectly explain the behaviour you saw.
Also, I improved the h2 server-status information for better analysis.
It would be very nice, if you could take this for a test run on your system, if you find the time. Thanks.
Great to hear when do you think this will be available on the CodeIT repo?
I must admit, I have no idea what you mean.
@icing, Adam is referring to the CodeIT repository for CentOS, this is where I also got the mod_http2 2.0.0 package. I checked yesterday, but 2.0.1 isn't available yet on that repo. I'll keep checking and I will give you feedback as soon as I can test it.
Ok, I had to check before typing... 2.0.1 is available on the CodeIT repo, also a shout out to @Adam7288 so he can test as well. I've installed it on 2 servers now to test it...
So far so good - testing in production on humblefax.com
Guys, any news on this? Your silence means either it is working or you are so disgusted that you threw it all away! Which one is it?
@icing it's good news, it seems to be working :-) I've only installed it on 2 servers so far, I'll try to upgrade the other ones as well this week.
Excellent. Originally, I wanted to make 2.0.x part of the httpd release this week, but I am holding it back this time and backport it in the next one. So, there is no pressure here on me to get verification.
Thanks for testing it on your systems!
Same, no issues to speak of. The 2.0.0 would have had a process @ 100% within a few hours and its been 5 days now.
Ever since I upgraded to version 2.0.0 I noticed some servers have one or more hanging httpd processes, consuming a lot of CPU in top. I would guess this was something that started piling up: during the day I noticed one or two, after a night I noticed several.
Looking at it on the server-status page I would see (a lot of) workers with a very large SS and Dur for these PIDs, with mode of operation W (sending reply) or G (gracefully finishing). After a night I would also see a lot of workers with no PID anymore, but still a large SS and Dur. In the upper table on the server-status page the PIDs also showed "yes (old gen)" in the stopping column.
Researching the access logs didn't bring me any information. When I noticed the upgrade to version 2.0.0 was something that happened recently, I downgraded it again and the problems seem to have vanished now.
Since the problem was on live servers, I downgraded them all again.