icing / mod_h2

HTTP/2 module for Apache httpd
https://icing.github.io/mod_h2/
Apache License 2.0
256 stars 41 forks source link

Child process sometimes hang, peg cpu @ 100% #234

Closed Adam7288 closed 1 year ago

Adam7288 commented 2 years ago

Here is pstack trace for one of the hung processes pegging cpu @ 100% - looks like some kind of deadlock

0 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371

1 0x00007f2d659f0f9e in _L_unlock_738 () from /lib64/libpthread.so.0

2 0x00007f2d659f0f10 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7f2c7c5248f8) at pthread_mutex_unlock.c:55

3 __GI___pthread_mutex_unlock (mutex=0x7f2c7c5248f8) at pthread_mutex_unlock.c:330

4 0x00007f2d57cb140d in h2_beam_receive () from /etc/httpd/modules/mod_http2.so

5 0x00007f2d57cc8ee3 in buffer_output_receive () from /etc/httpd/modules/mod_http2.so

6 0x00007f2d57ccb1ec in stream_data_cb () from /etc/httpd/modules/mod_http2.so

7 0x00007f2d66cba171 in nghttp2_session_pack_data () from /lib64/libnghttp2.so.14

8 0x00007f2d66cbaedd in nghttp2_session_mem_send_internal () from /lib64/libnghttp2.so.14

9 0x00007f2d66cbbae9 in nghttp2_session_send () from /lib64/libnghttp2.so.14

10 0x00007f2d57cc7544 in h2_session_send () from /etc/httpd/modules/mod_http2.so

11 0x00007f2d57cc777a in h2_session_process () from /etc/httpd/modules/mod_http2.so

12 0x00007f2d57cb2149 in h2_c1_run () from /etc/httpd/modules/mod_http2.so

13 0x00007f2d57cb2569 in h2_c1_hook_process_connection () from /etc/httpd/modules/mod_http2.so

14 0x00005571f66c33c0 in ap_run_process_connection (c=c@entry=0x7f2d4006d5e0) at connection.c:42

15 0x00007f2d5a8ab40a in process_socket (thd=thd@entry=0x5571f72c4510, p=, sock=, cs=0x7f2d4006d530, my_child_num=my_child_num@entry=11, my_thread_num=my_thread_num@entry=12) at event.c:1086

16 0x00007f2d5a8ae6ae in worker_thread (thd=0x5571f72c4510, dummy=) at event.c:2179

17 0x00007f2d659edea5 in start_thread (arg=0x7f2d3a7f4700) at pthread_create.c:307

18 0x00007f2d65512b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

icing commented 2 years ago

Hmm, how could it deadlock when unlocking a mutex?

Which version of httpd and mod_http2 are you running?

Adam7288 commented 2 years ago

Good point, just trying to sound competent, failing.

Server version: Apache/2.4.54 (codeit)

Name : mod_http2 Arch : x86_64 Version : 2.0.3 Release : 2.codeit.el7 Size : 398 k Repo : installed From repo : CodeIT Summary : module implementing HTTP/2 for Apache 2 URL : https://icing.github.io/mod_h2/ License : ASL 2.0 Description : The mod_h2 Apache httpd module implements the HTTP2 protocol : (h2+h2c) on top of libnghttp2 for httpd 2.4 servers.

alexskynet commented 2 years ago

Same problem here on several different servers:

OS CentOS 7 httpd-2.4.54-1.codeit.el7.x86_64 mod_http2-2.0.4-1.codeit.el7.x86_64 The problem appears when using mpm_worker or mpm_even It doesn't happen using mpm_prefork So we had to "downgrade" from mpm_event to prefork to keep http2 turned on, even if we lost a lot of mpm performance Version 2.0.4 is not fixing the behaviour httpd processes keep CPU at 100% and at memory while progressively every CPU Processor get filled and RAM also strace ing the stuck httpd processes says they are all locked in read

Hope this may help to track down the problem

icing commented 2 years ago

@alexskynet you do not have to switch mpm to disable http/2. http/2 is only used if you configure

Protocols h2 http/1.1

in your server. If you comment that out or just configure Protocols http/1.1, no h2 will be used.

That allows you to determine if the problem is related to http2 or if there is another problem in your server.

alexskynet commented 2 years ago

Thank you very much @icing I realized that few minutes after my post so I'm running with event now I also realized that the reason it worked with prefork is that mod_http2 doesn't work with this worker We runned several days with no problems, while when we started apache with mpm_event, and http2, after few time (variable from ten minutes to a max one hour) we started having more and more httpd processes stuck in read till the server hanged completely. Have you idea how we may workaround those hangs? Thank you again and best regards

alexskynet commented 2 years ago

I want to add a little piece to the puzzle that may help We run several servers and noticed that some of those work fin with http2 with no hangs Digging a little bit we noticed that the servers that hang the apache process have the root on NFS filesystem and the ones that don't hang have local filesystem. Running the NFS rooted servers without the mod_http2 loaded leads to fully functional wuth no hangs of sort. It wold sound interesting know id @Adam7288 is NFS rooted or not

alexskynet commented 2 years ago

Another info Previous message isn't completely perfect ... We discovered a server with nfs based root that works ok with mod_http2 We see no differences in configuration of mount point:

172.16.0.100:/apachesites on /var/www/virtual type nfs4 (rw,noatime,nodiratime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.32,local_lock=none,addr=172.16.0.100)

The difference is that the hanging servers all use Joomla both versions 3 and 4 with fastcgi but have no errors in logs of the sites and in php logs.

The working server uses the same configuration but runs a custom php application.

From the logs users are using http2 without any kind of hang.

We are a little bit confused at this point

icing commented 2 years ago

Yeah, hard to see the pattern from here.

icing commented 2 years ago

I think you'd need to vary one thing at a time in your setup in order to find out which combination is the critical one. For example, place your /var/www/virtual/ on an ext4 and see if the problem persists.

alexskynet commented 2 years ago

Yes @icing this is a nightmare: too many parameters moving ... Three servers sharing the same kind of configuration show the same behaviour They are all CentOS 7 updated with the latest updates They all use FastCGI for PHP (various versions from REMI repo inclufing PHP 7.4, 8.0 and 8.1 latest available), mod_pagespeed One of them uses ISPConfig They all have NFS rooted system and they all hang with httpd processes stuck in read A fortth server runs nextcloud whith same configuration except local FS: this doesn't hang Sixth and seventh server share the same confs with NFS root but only run custom PHP applicatiion and don't hang So .. a real puzzle

alexskynet commented 2 years ago

Tried to downgrade the mod_http2 from 2.0.4 to 2.0.3 on one server Let's wait

alexskynet commented 2 years ago

Downgrading to 2.0.3 seems to fix the problem We've run form more than two hours with no hangs and normal load Now we are testing the downgrade on a second server

Adam7288 commented 2 years ago

I was experiencing this issue on 2.0.3 and continue to experience it on 2.0.4. NFS is something we use too so that is an interesting avenue. @icing is there any way we can get better data to troubleshoot this?

alexskynet commented 2 years ago

Thank you for your answer @Adam7288 this sounds interesting Wich version of NFS protocol are you using? 3, 4.0 or 4.1? We are using protocol 4.0 with noatime and nodiratrime options on the servers where we're experimenting the anomaly I suspect something with NFS on the release of locks maybe??? In the meantime we've been running for some hours on 2.0.3 with no hangs while with 2.0.4 is usually a matter of minutes ...

icing commented 2 years ago

I looked at the changes between 2.0.3 and 2.0.4 and would have been surprised by differences. BUT I can make you a 2.0.5 with the changes I did in Apache trunk. There are difference in connection handling. Hard to tell if they improve your situation, but I'd like to know how those changes fare for you.

The alternative in your case would be to try the mod_http2 that was released in Apache 2.5.54.

icing commented 2 years ago

Regarding NFS usage: I have no personal experience with running a server that way. As I understand your setups, you use NFS to share the htdocs between nodes.

Question is: are files on the NFS share being modified and may that have impact? Have you EnableMmap on on this for example? Do you have H2CopyFiles on or off (this is the default).

Adam7288 commented 2 years ago

Only time nfs is invoked on our end is within a php script to access them in some way - for instance a document download. We do not serve up static or dynamic content directly from the nfs share.

alexskynet commented 2 years ago

Thank you @icing I appreciate your help very much I understand prfectly thet sometimes software devel leads to unexpected results ... it happenetd to me also so many times :-) If I can help in testing I'll be more than happy to do it Yes you understood perfectely We share the root among some nodes The only files that are actually modified during access to the webserver are cache files, that may be regenerated if expired, and session files. Yes EnableMmap is on

icing commented 2 years ago

Ok, I'll make a v2.0.5 for testing. In the meantime, you might want to configure H2CopyFiles on to see if that makes any difference for you.

icing commented 2 years ago

https://github.com/icing/mod_h2/releases/tag/v2.0.5

alexskynet commented 2 years ago

thank you @icing I'm compiling right now Stay tuned

alexskynet commented 2 years ago

Sorry @icing Module installed and restarted httpd Got 2 stuck processes in read in a couple of minutes

icing commented 2 years ago

"stuck in read" as from the status handler or some backtrace?

alexskynet commented 2 years ago

strace -p shows read htop show cpu 100% for the process

alexskynet commented 2 years ago

Status shows

6781 0/0/0 W 0.00 22 0 0 0.0 0.00 0.00 87.27.4.2 h2

rolled back to 2.0.3: no hangs till now

icing commented 2 years ago

Hmm, I assume your site is too busy to raise the http2 log level much?

alexskynet commented 2 years ago

I may reinstall the 2.0.5 and try to increase the log for few minutes if it may help Which directive do I need to insert?

icing commented 2 years ago

If you can take it LogLevel http2:trace1, but may start with LogLevel http2:debug.

alexskynet commented 2 years ago

Ok give me some minutes

alexskynet commented 2 years ago

Stuck in seconds with 2.0.5 This is trace log but it looks ok to me

[Wed Sep 21 16:09:53.574077 2022] [http2:debug] [pid 23712:tid 139812816349376] h2_workers.c(318): h2_workers: cleanup 25 workers idling [Wed Sep 21 16:09:53.574635 2022] [http2:debug] [pid 23712:tid 139812816349376] h2_workers.c(346): h2_workers: cleanup all workers terminated [Wed Sep 21 16:09:53.575301 2022] [http2:debug] [pid 23712:tid 139812816349376] h2_workers.c(349): h2_workers: cleanup zombie workers joined [Wed Sep 21 16:12:20.930961 2022] [mpm_worker:notice] [pid 23636:tid 139812816349376] AH00297: SIGUSR1 received. Doing graceful restart [Wed Sep 21 16:12:21.009126 2022] [http2:debug] [pid 25117:tid 139812816349376] h2_workers.c(370): h2_workers: cleanup 25 workers (25 idle) [Wed Sep 21 16:12:21.009464 2022] [http2:debug] [pid 25117:tid 139812816349376] h2_workers.c(406): h2_workers: cleanup all workers terminated [Wed Sep 21 16:12:21.010195 2022] [http2:debug] [pid 25117:tid 139812816349376] h2_workers.c(411): h2_workers: cleanup zombie workers joined [Wed Sep 21 16:12:22.385166 2022] [lbmethod_heartbeat:notice] [pid 23636:tid 139812816349376] AH02282: No slotmem from mod_heartmonitor [Wed Sep 21 16:12:22.388610 2022] [http2:info] [pid 23636:tid 139812816349376] AH03090: mod_http2 (v2.0.5-git, feats=CHPRIO+SHA256+INVHD+DWINS, nghttp2 1.49.0), initializing... [Wed Sep 21 16:12:22.388690 2022] [http2:trace1] [pid 23636:tid 139812816349376] h2_protocol.c(426): h2_h2, child_init [Wed Sep 21 16:12:22.388773 2022] [http2:trace1] [pid 23636:tid 139812816349376] h2_switch.c(46): h2_switch init [Wed Sep 21 16:12:22.526948 2022] [mpm_worker:notice] [pid 23636:tid 139812816349376] AH00292: Apache/2.4.54 (codeit) OpenSSL/1.1.1q+quic mod_fcgid/2.3.9 configured -- resuming normal operations [Wed Sep 21 16:12:22.587929 2022] [core:notice] [pid 23636:tid 139812816349376] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND' [Wed Sep 21 16:12:22.592326 2022] [http2:info] [pid 26351:tid 139812816349376] h2_workers: created with min=25 max=37 idle_ms=600000 [Wed Sep 21 16:12:23.448280 2022] [http2:trace1] [pid 26351:tid 139811847399168] h2_c1.c(218): [client 10.0.10.252:39865] h2_h2, process_conn [Wed Sep 21 16:12:23.448344 2022] [http2:trace1] [pid 26351:tid 139811847399168] h2_c1.c(226): [client 10.0.10.252:39865] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:12:23.448371 2022] [http2:trace1] [pid 26351:tid 139811847399168] h2_c1.c(282): [client 10.0.10.252:39865] h2_h2, declined [Wed Sep 21 16:12:23.594873 2022] [http2:info] [pid 26438:tid 139812816349376] h2_workers: created with min=25 max=37 idle_ms=600000 [Wed Sep 21 16:12:25.171200 2022] [http2:debug] [pid 25034:tid 139812816349376] h2_workers.c(370): h2_workers: cleanup 25 workers (25 idle) [Wed Sep 21 16:12:25.171708 2022] [http2:debug] [pid 25034:tid 139812816349376] h2_workers.c(406): h2_workers: cleanup all workers terminated [Wed Sep 21 16:12:25.172210 2022] [http2:debug] [pid 25034:tid 139812816349376] h2_workers.c(411): h2_workers: cleanup zombie workers joined [Wed Sep 21 16:12:33.463932 2022] [http2:trace1] [pid 26438:tid 139812048725760] h2_c1.c(218): [client 10.0.10.252:12296] h2_h2, process_conn [Wed Sep 21 16:12:33.463996 2022] [http2:trace1] [pid 26438:tid 139812048725760] h2_c1.c(226): [client 10.0.10.252:12296] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:12:33.464018 2022] [http2:trace1] [pid 26438:tid 139812048725760] h2_c1.c(282): [client 10.0.10.252:12296] h2_h2, declined [Wed Sep 21 16:12:36.796691 2022] [http2:trace1] [pid 26351:tid 139811855791872] h2_c1.c(218): [client 104.155.8.47:55726] h2_h2, process_conn [Wed Sep 21 16:12:36.796740 2022] [http2:trace1] [pid 26351:tid 139811855791872] h2_c1.c(226): [client 104.155.8.47:55726] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:12:36.796759 2022] [http2:trace1] [pid 26351:tid 139811855791872] h2_c1.c(282): [client 104.155.8.47:55726] h2_h2, declined [Wed Sep 21 16:12:43.466684 2022] [http2:trace1] [pid 26438:tid 139812057118464] h2_c1.c(218): [client 10.0.10.252:16043] h2_h2, process_conn [Wed Sep 21 16:12:43.466738 2022] [http2:trace1] [pid 26438:tid 139812057118464] h2_c1.c(226): [client 10.0.10.252:16043] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:12:43.467413 2022] [http2:trace1] [pid 26438:tid 139812057118464] h2_c1.c(282): [client 10.0.10.252:16043] h2_h2, declined [Wed Sep 21 16:12:53.482230 2022] [http2:trace1] [pid 26351:tid 139811788650240] h2_c1.c(218): [client 10.0.10.252:39973] h2_h2, process_conn [Wed Sep 21 16:12:53.482277 2022] [http2:trace1] [pid 26351:tid 139811788650240] h2_c1.c(226): [client 10.0.10.252:39973] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:12:53.482290 2022] [http2:trace1] [pid 26351:tid 139811788650240] h2_c1.c(282): [client 10.0.10.252:39973] h2_h2, declined [Wed Sep 21 16:13:00.838746 2022] [http2:trace1] [pid 26438:tid 139812006762240] h2_c1.c(218): [client 80.94.92.231:56704] h2_h2, process_conn [Wed Sep 21 16:13:00.838798 2022] [http2:trace1] [pid 26438:tid 139812006762240] h2_c1.c(226): [client 80.94.92.231:56704] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:13:00.838821 2022] [http2:trace1] [pid 26438:tid 139812006762240] h2_c1.c(282): [client 80.94.92.231:56704] h2_h2, declined [Wed Sep 21 16:13:03.492551 2022] [http2:trace1] [pid 26351:tid 139811855791872] h2_c1.c(218): [client 10.0.10.252:31642] h2_h2, process_conn [Wed Sep 21 16:13:03.492615 2022] [http2:trace1] [pid 26351:tid 139811855791872] h2_c1.c(226): [client 10.0.10.252:31642] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:13:03.492640 2022] [http2:trace1] [pid 26351:tid 139811855791872] h2_c1.c(282): [client 10.0.10.252:31642] h2_h2, declined [Wed Sep 21 16:13:13.506889 2022] [http2:trace1] [pid 26351:tid 139811704723200] h2_c1.c(218): [client 10.0.10.252:24285] h2_h2, process_conn [Wed Sep 21 16:13:13.506939 2022] [http2:trace1] [pid 26351:tid 139811704723200] h2_c1.c(226): [client 10.0.10.252:24285] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:13:13.506957 2022] [http2:trace1] [pid 26351:tid 139811704723200] h2_c1.c(282): [client 10.0.10.252:24285] h2_h2, declined [Wed Sep 21 16:13:23.517390 2022] [http2:trace1] [pid 26351:tid 139811662759680] h2_c1.c(218): [client 10.0.10.252:31236] h2_h2, process_conn [Wed Sep 21 16:13:23.517450 2022] [http2:trace1] [pid 26351:tid 139811662759680] h2_c1.c(226): [client 10.0.10.252:31236] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:13:23.517469 2022] [http2:trace1] [pid 26351:tid 139811662759680] h2_c1.c(282): [client 10.0.10.252:31236] h2_h2, declined [Wed Sep 21 16:13:33.515813 2022] [http2:trace1] [pid 26351:tid 139811847399168] h2_c1.c(218): [client 10.0.10.252:18757] h2_h2, process_conn

alexskynet commented 2 years ago

[Wed Sep 21 16:14:03.559682 2022] [http2:trace1] [pid 26351:tid 139811746686720] h2_c1.c(282): [client 10.0.10.252:6495] h2_h2, declined [Wed Sep 21 16:14:13.577250 2022] [http2:trace1] [pid 26438:tid 139812015154944] h2_c1.c(218): [client 10.0.10.252:4612] h2_h2, process_conn [Wed Sep 21 16:14:13.577292 2022] [http2:trace1] [pid 26438:tid 139812015154944] h2_c1.c(226): [client 10.0.10.252:4612] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:13.577310 2022] [http2:trace1] [pid 26438:tid 139812015154944] h2_c1.c(282): [client 10.0.10.252:4612] h2_h2, declined [Wed Sep 21 16:14:20.904512 2022] [http2:trace1] [pid 26438:tid 139811989976832] h2_c1.c(218): [client 212.199.151.43:50797] h2_h2, process_conn [Wed Sep 21 16:14:20.904551 2022] [http2:trace1] [pid 26438:tid 139811989976832] h2_c1.c(226): [client 212.199.151.43:50797] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:20.904567 2022] [http2:trace1] [pid 26438:tid 139811989976832] h2_c1.c(282): [client 212.199.151.43:50797] h2_h2, declined [Wed Sep 21 16:14:23.578158 2022] [http2:trace1] [pid 26438:tid 139811973191424] h2_c1.c(218): [client 10.0.10.252:47543] h2_h2, process_conn [Wed Sep 21 16:14:23.578213 2022] [http2:trace1] [pid 26438:tid 139811973191424] h2_c1.c(226): [client 10.0.10.252:47543] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:23.578243 2022] [http2:trace1] [pid 26438:tid 139811973191424] h2_c1.c(282): [client 10.0.10.252:47543] h2_h2, declined [Wed Sep 21 16:14:31.399700 2022] [http2:trace1] [pid 26438:tid 139811931227904] h2_c1.c(218): [client 47.61.241.205:9498] h2_h2, process_conn [Wed Sep 21 16:14:31.399763 2022] [http2:trace1] [pid 26438:tid 139811931227904] h2_c1.c(226): [client 47.61.241.205:9498] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:31.399791 2022] [http2:trace1] [pid 26438:tid 139811931227904] h2_c1.c(282): [client 47.61.241.205:9498] h2_h2, declined [Wed Sep 21 16:14:31.834425 2022] [http2:trace1] [pid 26351:tid 139811839006464] h2_c1.c(218): [client 195.103.82.14:19271] h2_h2, process_conn [Wed Sep 21 16:14:31.834472 2022] [http2:trace1] [pid 26351:tid 139811839006464] h2_c1.c(226): [client 195.103.82.14:19271] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:31.834488 2022] [http2:trace1] [pid 26351:tid 139811839006464] h2_c1.c(282): [client 195.103.82.14:19271] h2_h2, declined [Wed Sep 21 16:14:33.592091 2022] [http2:trace1] [pid 26438:tid 139811922835200] h2_c1.c(218): [client 10.0.10.252:12914] h2_h2, process_conn [Wed Sep 21 16:14:33.592147 2022] [http2:trace1] [pid 26438:tid 139811922835200] h2_c1.c(226): [client 10.0.10.252:12914] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:33.592167 2022] [http2:trace1] [pid 26438:tid 139811922835200] h2_c1.c(282): [client 10.0.10.252:12914] h2_h2, declined [Wed Sep 21 16:14:36.961079 2022] [http2:trace1] [pid 26351:tid 139811830613760] h2_c1.c(218): [client 80.94.92.231:42636] h2_h2, process_conn [Wed Sep 21 16:14:36.961140 2022] [http2:trace1] [pid 26351:tid 139811830613760] h2_c1.c(226): [client 80.94.92.231:42636] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:36.961168 2022] [http2:trace1] [pid 26351:tid 139811830613760] h2_c1.c(282): [client 80.94.92.231:42636] h2_h2, declined [Wed Sep 21 16:14:41.879892 2022] [http2:trace1] [pid 26351:tid 139811797042944] h2_c1.c(218): [client 54.36.149.58:38879] h2_h2, process_conn [Wed Sep 21 16:14:41.879944 2022] [http2:trace1] [pid 26351:tid 139811797042944] h2_c1.c(226): [client 54.36.149.58:38879] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:41.879961 2022] [http2:trace1] [pid 26351:tid 139811797042944] h2_c1.c(282): [client 54.36.149.58:38879] h2_h2, declined [Wed Sep 21 16:14:43.588767 2022] [http2:trace1] [pid 26438:tid 139811889264384] h2_c1.c(218): [client 10.0.10.252:20986] h2_h2, process_conn [Wed Sep 21 16:14:43.588815 2022] [http2:trace1] [pid 26438:tid 139811889264384] h2_c1.c(226): [client 10.0.10.252:20986] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:43.588831 2022] [http2:trace1] [pid 26438:tid 139811889264384] h2_c1.c(282): [client 10.0.10.252:20986] h2_h2, declined [Wed Sep 21 16:14:52.050764 2022] [http2:trace1] [pid 26438:tid 139811872478976] h2_c1.c(218): [client 212.199.151.43:50798] h2_h2, process_conn [Wed Sep 21 16:14:52.050810 2022] [http2:trace1] [pid 26438:tid 139811872478976] h2_c1.c(226): [client 212.199.151.43:50798] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:53.602079 2022] [http2:trace1] [pid 26351:tid 139811780257536] h2_c1.c(218): [client 10.0.10.252:25713] h2_h2, process_conn [Wed Sep 21 16:14:53.602169 2022] [http2:trace1] [pid 26351:tid 139811780257536] h2_c1.c(226): [client 10.0.10.252:25713] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:14:53.602209 2022] [http2:trace1] [pid 26351:tid 139811780257536] h2_c1.c(282): [client 10.0.10.252:25713] h2_h2, declined [Wed Sep 21 16:15:03.672347 2022] [http2:trace1] [pid 26351:tid 139811696330496] h2_c1.c(218): [client 10.0.10.252:24106] h2_h2, process_conn [Wed Sep 21 16:15:03.672415 2022] [http2:trace1] [pid 26351:tid 139811696330496] h2_c1.c(226): [client 10.0.10.252:24106] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1 [Wed Sep 21 16:15:03.672437 2022] [http2:trace1] [pid 26351:tid 139811696330496] h2_c1.c(282): [client 10.0.10.252:24106] h2_h2, declined [Wed Sep 21 16:15:07.065872 2022] [http2:debug] [pid 26438:tid 139811872478976] h2_c1.c(246): (70007)The timeout specified has expired: [client 212.199.151.43:50798] AH03054: h2_h2, error reading 24 bytes speculative [Wed Sep 21 16:15:12.537993 2022] [http2:trace1] [pid 26351:tid 139811713115904] h2_c1.c(218): [client 185.191.171.4:5330] h2_h2, process_conn [Wed Sep 21 16:15:12.538054 2022] [http2:trace1] [pid 26351:tid 139811713115904] h2_c1.c(226): [client 185.191.171.4:5330] h2_h2, process_conn, new connection using protocol 'http/1.1', direct=1, tls acceptable=1

icing commented 2 years ago

Can you email me the log to stefan at eissing.org?

alexskynet commented 2 years ago

Are the h2_h2, declined Ok?

alexskynet commented 2 years ago

yes give me a minute

alexskynet commented 2 years ago

Ok you shoud have it

icing commented 2 years ago

Yes, thank you. Looking at it now.

icing commented 2 years ago

What I see in the log:

One thing I recommend to try is configure H2Direct off to disable this sniffing by mod_http2 in case it interferes with mod_security.

If that has no effect, I'd recommend to make at least a try without mod_security loaded. Since mod_security is not from the Apache httpd project, I do not know how closely they followed our development. That they no longer sell it and have End-Of-Life announced makes me wonder. At least to see if it is in any way involved in the problems you see, I'd give this a try.

Any "real" http2 connections will appear in your logs as

h2_session.c(962): [client xxx] AH03200: h2_session(2-0,INIT,0): created, max_streams=100, stream_mem=32768, workers_limit=6, workers_max=37, push_diary(type=1,N=256)
...

If no such line is in your logs, no HTTP/2 is in play. With H2Direct off mod_http2 should then not interfere with any connctions (the h2_h2, process_connand h2_h2, declined will still appear, but they do not really do anything).

alexskynet commented 2 years ago

Thank you @icing I've started a new conf with better logging and the changes you suggested I'll let you know

alexskynet commented 2 years ago

running 2.0.5 in debug mode now

alexskynet commented 2 years ago

Here I am with some info. As I start httpd with version 2.0.5 the CPU load rises to 100%, look at this stats from mod_status

Server Version: Apache/2.4.54 (codeit) OpenSSL/1.1.1q+quic mod_fcgid/2.3.9
Server MPM: worker
Server Built: Jun 8 2022 09:34:58

Current Time: Thursday, 22-Sep-2022 09:15:24 CEST
Restart Time: Thursday, 22-Sep-2022 09:08:55 CEST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 6 minutes 28 seconds
Server load: 5.16 6.63 6.25
Total accesses: 370 - Total Traffic: 3.6 MB - Total Duration: 945830
CPU Usage: u233.55 s675.72 cu0 cs0 - 234% CPU load
.954 requests/sec - 9.4 kB/second - 9.8 kB/request - 2556.3 ms/request
7 requests currently being processed, 43 idle workers

______LL__W____K____________________L________K___W..............
................................................................
......................

as you see 234% CPU load

Here is the same traffic with 2.0.3 loaded:

Server Version: Apache/2.4.54 (codeit) OpenSSL/1.1.1q+quic mod_fcgid/2.3.9
Server MPM: worker
Server Built: Jun 8 2022 09:34:58

Current Time: Thursday, 22-Sep-2022 09:25:37 CEST
Restart Time: Thursday, 22-Sep-2022 09:19:55 CEST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 5 minutes 42 seconds
Server load: 1.29 2.39 4.36
Total accesses: 456 - Total Traffic: 8.1 MB - Total Duration: 224842
CPU Usage: u7.17 s1.15 cu0 cs0 - 2.43% CPU load
1.33 requests/sec - 24.3 kB/second - 18.2 kB/request - 493.075 ms/request
9 requests currently being processed, 41 idle workers

W__WL_____L_________K__L_____L_________K___R______..............
................................................................
......................

Here we have 2.43% load I'm digging into the 10GB logs to extract something relevant Stay tuned

alexskynet commented 2 years ago

sent you some logs by email

nono303 commented 2 years ago

Hi ! FYI, same issue on Windows with mod_md 2.0.5 with nghttp2 1.48 (as nghttp2 1.49 and still 1.50 failed with curl when upgrading to h2)

icing commented 2 years ago

@nono303 now we only need to find out what the issue is!🙈

But thanks for chiming in. Are you able to reproduce the issue at a http2:trace2 log level? Will be quite verbose...

icing commented 2 years ago

Updated my log analyzer to v2.0.5 output. See h2wtf. Running that on the test.log that @alexskynet sent me, I see two connections where the request times out, for example:

id               created                          scheduled           started            response            ended              reset             cleanup         error
303-25-0         2022-09-22 09:17:01.488022           --                 --                      --            --                 --              +19.080493 70007 +19.076375
303-25-15        2022-09-22 09:17:01.511985           --                 --           200 +0.350654          +0.352962            --                 --           --
303-25-17        2022-09-22 09:17:02.012644           --                 --           304 +0.003347          +0.003376            --                 --           --
303-25-19        2022-09-22 09:17:02.130160           --                 --           304 +0.006242          +0.006319            --                 --           --
303-25-21        2022-09-22 09:17:05.028766           --                 --           302 +0.266084          +0.266340            --                 --           --
303-25-23        2022-09-22 09:17:05.420298           --                 --           302 +0.025555          +0.025604            --                 --           --
303-25-25        2022-09-22 09:17:05.541844           --                 --                      --            --                 --                 --           --

303-25-0 is the overall connection (always -0) and the others are the opened requests on that connection. 6 request are received quite rapidly and all but the last produce a response.

The requested URLs would only be visible on a higher log level (or I have to make a version that logs them on debug also). That would be interesting. I suspect that this request ends up via mod_fcgid in the php application.

alexskynet commented 2 years ago

@icing you're right mod_fcgid is used for all our applications At the moment I'm running a new test: I recompiled 2.0.5 against libnghttp2-devel-1.50.0-1.codeit.el7.x86_64 which has been updated this morning

icing commented 2 years ago

Scanning my code and the log some more: What is the configuration for Timeout that you have configured? It may be that the new mod_http2 applies that in situations where it was not done before, e.g. waiting for a response should not really apply this.

alexskynet commented 2 years ago

Timeout 15

icing commented 2 years ago

That would match with the logged connection timeout. I'll do a v2.0.6 with a fix this evening.

You could try increasing that value in your config in the meantime to see if it makes a difference.

alexskynet commented 2 years ago

Ok

alexskynet commented 2 years ago

set Timeout to 30