Closed Adam7288 closed 2 years ago
pstack of httpd process at 100% running 2.0.5 timout30
pstack 3560 Thread 27 (Thread 0x7fe8649ee700 (LWP 3576)):
Thread 26 (Thread 0x7fe8641ed700 (LWP 3577)):
Thread 25 (Thread 0x7fe8639ec700 (LWP 3578)):
Thread 24 (Thread 0x7fe8631eb700 (LWP 3581)):
Thread 23 (Thread 0x7fe8629ea700 (LWP 3582)):
Thread 22 (Thread 0x7fe8621e9700 (LWP 3585)):
Thread 21 (Thread 0x7fe8619e8700 (LWP 3588)):
Thread 20 (Thread 0x7fe8611e7700 (LWP 3589)):
Thread 19 (Thread 0x7fe8609e6700 (LWP 3590)):
Thread 18 (Thread 0x7fe8601e5700 (LWP 3591)):
Thread 17 (Thread 0x7fe85f9e4700 (LWP 3592)):
Thread 16 (Thread 0x7fe85f1e3700 (LWP 3595)):
Thread 15 (Thread 0x7fe85e9e2700 (LWP 3596)):
Thread 14 (Thread 0x7fe85e1e1700 (LWP 3599)):
Thread 13 (Thread 0x7fe85d9e0700 (LWP 3600)):
Thread 12 (Thread 0x7fe85d1df700 (LWP 3603)):
Thread 11 (Thread 0x7fe85c9de700 (LWP 3604)):
Thread 10 (Thread 0x7fe85c1dd700 (LWP 3608)):
Thread 9 (Thread 0x7fe85b9dc700 (LWP 3611)):
Thread 8 (Thread 0x7fe85b1db700 (LWP 3612)):
Thread 7 (Thread 0x7fe85a9da700 (LWP 3615)):
Thread 6 (Thread 0x7fe85a1d9700 (LWP 3618)):
Thread 5 (Thread 0x7fe8599d8700 (LWP 3619)):
Thread 4 (Thread 0x7fe8591d7700 (LWP 3620)):
Thread 3 (Thread 0x7fe8589d6700 (LWP 3621)):
Thread 2 (Thread 0x7fe84cff9700 (LWP 3668)):
Thread 1 (Thread 0x7fe87fe328c0 (LWP 3560)):
Another 100%
hread 1 (process 3668):
And another one: pstack 3560 Thread 27 (Thread 0x7fe8649ee700 (LWP 3576)):
Thread 26 (Thread 0x7fe8641ed700 (LWP 3577)):
Thread 25 (Thread 0x7fe8639ec700 (LWP 3578)):
Thread 24 (Thread 0x7fe8631eb700 (LWP 3581)):
Thread 23 (Thread 0x7fe8629ea700 (LWP 3582)):
Thread 22 (Thread 0x7fe8621e9700 (LWP 3585)):
Thread 21 (Thread 0x7fe8619e8700 (LWP 3588)):
Thread 20 (Thread 0x7fe8611e7700 (LWP 3589)):
Thread 19 (Thread 0x7fe8609e6700 (LWP 3590)):
Thread 18 (Thread 0x7fe8601e5700 (LWP 3591)):
Thread 17 (Thread 0x7fe85f9e4700 (LWP 3592)):
Thread 16 (Thread 0x7fe85f1e3700 (LWP 3595)):
Thread 15 (Thread 0x7fe85e9e2700 (LWP 3596)):
Thread 14 (Thread 0x7fe85e1e1700 (LWP 3599)):
Thread 13 (Thread 0x7fe85d9e0700 (LWP 3600)):
Thread 12 (Thread 0x7fe85d1df700 (LWP 3603)):
Thread 11 (Thread 0x7fe85c9de700 (LWP 3604)):
Thread 10 (Thread 0x7fe85c1dd700 (LWP 3608)):
Thread 9 (Thread 0x7fe85b9dc700 (LWP 3611)):
Thread 8 (Thread 0x7fe85b1db700 (LWP 3612)):
Thread 7 (Thread 0x7fe85a9da700 (LWP 3615)):
Thread 6 (Thread 0x7fe85a1d9700 (LWP 3618)):
Thread 5 (Thread 0x7fe8599d8700 (LWP 3619)):
Thread 4 (Thread 0x7fe8591d7700 (LWP 3620)):
Thread 3 (Thread 0x7fe8589d6700 (LWP 3621)):
Thread 2 (Thread 0x7fe84cff9700 (LWP 3668)):
Thread 1 (Thread 0x7fe87fe328c0 (LWP 3560)):
Thanks for the traces. The first one really helped. The other seem to have the symbols/source mixed up.
Anyways, here is v2.0.6 with three fixes. Hope these work for you as well.
Thank you Stefan
Just compiled and installed on one server
I'll let you know soon
Best regards
Il 22 settembre 2022 20:23:49 CEST, Stefan Eissing @.***> ha scritto:
Thanks for the traces. The first one really helped. The other seem to have the symbols/source mixed up.
Anyways, here is v2.0.6 with three fixes. Hope these work for you as well.
-- Reply to this email directly or view it on GitHub: https://github.com/icing/mod_h2/issues/234#issuecomment-1255392015 You are receiving this because you were mentioned.
Tested 2.0.6 for about 20 minutes
Load on server greatly increases
It seems httpd processes still hang, but I will investigate better tomorrow morning since I'm operating from a tablet now.
I saw your Apache commit proposal: so it comes from Apache if I uderstand.
Thank you for you time
Best regards
Alessandro
Il 22 settembre 2022 20:23:49 CEST, Stefan Eissing @.***> ha scritto:
Thanks for the traces. The first one really helped. The other seem to have the symbols/source mixed up.
Anyways, here is v2.0.6 with three fixes. Hope these work for you as well.
-- Reply to this email directly or view it on GitHub: https://github.com/icing/mod_h2/issues/234#issuecomment-1255392015 You are receiving this because you were mentioned.
Message ID: @.***>
Here I am Unfortunately as soon as I start httpd with 2.0.6 all httpd processes go to 100% CPU.
Here are a couple of pstack(s) `pstack 3491 Thread 75 (Thread 0x7ff9fb98f700 (LWP 3496)):
Thread 74 (Thread 0x7ff9fb18e700 (LWP 3498)):
Thread 73 (Thread 0x7ff9fa98d700 (LWP 3499)):
Thread 72 (Thread 0x7ff9fa18c700 (LWP 3502)):
Thread 71 (Thread 0x7ff9f998b700 (LWP 3503)):
Thread 70 (Thread 0x7ff9f918a700 (LWP 3505)):
Thread 69 (Thread 0x7ff9f8989700 (LWP 3508)):
Thread 68 (Thread 0x7ff9f3fff700 (LWP 3511)):
Thread 67 (Thread 0x7ff9f37fe700 (LWP 3513)):
Thread 66 (Thread 0x7ff9f2ffd700 (LWP 3517)):
Thread 65 (Thread 0x7ff9f27fc700 (LWP 3516)):
Thread 64 (Thread 0x7ff9f17fa700 (LWP 3520)):
Thread 63 (Thread 0x7ff9f1ffb700 (LWP 3521)):
Thread 62 (Thread 0x7ff9f0ff9700 (LWP 3524)):
Thread 61 (Thread 0x7ff9f07f8700 (LWP 3526)):
Thread 60 (Thread 0x7ff9efff7700 (LWP 3527)):
Thread 59 (Thread 0x7ff9ef7f6700 (LWP 3531)):
Thread 58 (Thread 0x7ff9eeff5700 (LWP 3530)):
Thread 57 (Thread 0x7ff9ee7f4700 (LWP 3533)):
Thread 56 (Thread 0x7ff9edff3700 (LWP 3535)):
Thread 55 (Thread 0x7ff9ed7f2700 (LWP 3536)):
Thread 54 (Thread 0x7ff9ecff1700 (LWP 3539)):
Thread 53 (Thread 0x7ff9ec7f0700 (LWP 3538)):
Thread 52 (Thread 0x7ff9ebfef700 (LWP 3542)):
Thread 51 (Thread 0x7ff9eb7ee700 (LWP 3544)):
Thread 50 (Thread 0x7ff9eafed700 (LWP 3546)):
Thread 49 (Thread 0x7ff9ea7ec700 (LWP 3548)):
Thread 48 (Thread 0x7ff9e9feb700 (LWP 3549)):
Thread 47 (Thread 0x7ff9e97ea700 (LWP 3550)):
Thread 46 (Thread 0x7ff9e8fe9700 (LWP 3551)):
Thread 45 (Thread 0x7ff9e87e8700 (LWP 3552)):
Thread 44 (Thread 0x7ff9e7fe7700 (LWP 3553)):
Thread 43 (Thread 0x7ff9e77e6700 (LWP 3554)):
Thread 42 (Thread 0x7ff9e6fe5700 (LWP 3555)):
Thread 41 (Thread 0x7ff9e67e4700 (LWP 3556)):
Thread 40 (Thread 0x7ff9e5fe3700 (LWP 3557)):
Thread 39 (Thread 0x7ff9e57e2700 (LWP 3558)):
Thread 38 (Thread 0x7ff9e4fe1700 (LWP 3559)):
Thread 37 (Thread 0x7ff9e47e0700 (LWP 3560)):
Thread 36 (Thread 0x7ff9e3fdf700 (LWP 3561)):
Thread 35 (Thread 0x7ff9e37de700 (LWP 3562)):
Thread 34 (Thread 0x7ff9e2fdd700 (LWP 3563)):
Thread 33 (Thread 0x7ff9e1fdb700 (LWP 3571)):
Thread 32 (Thread 0x7ff9e17da700 (LWP 3573)):
Thread 31 (Thread 0x7ff9e0fd9700 (LWP 3575)):
Thread 30 (Thread 0x7ff9dbfff700 (LWP 3577)):
Thread 29 (Thread 0x7ff9db7fe700 (LWP 3579)):
Thread 28 (Thread 0x7ff9daffd700 (LWP 3581)):
Thread 27 (Thread 0x7ff9da7fc700 (LWP 3583)):
Thread 26 (Thread 0x7ff9d9ffb700 (LWP 3585)):
Thread 25 (Thread 0x7ff9d97fa700 (LWP 3587)):
Thread 24 (Thread 0x7ff9d8ff9700 (LWP 3589)):
Thread 23 (Thread 0x7ff9d87f8700 (LWP 3590)):
Thread 22 (Thread 0x7ff9d7ff7700 (LWP 3592)):
Thread 21 (Thread 0x7ff9d77f6700 (LWP 3595)):
Thread 20 (Thread 0x7ff9d6ff5700 (LWP 3597)):
Thread 19 (Thread 0x7ff9d67f4700 (LWP 3599)):
Thread 18 (Thread 0x7ff9d5ff3700 (LWP 3601)):
Thread 17 (Thread 0x7ff9d57f2700 (LWP 3603)):
Thread 16 (Thread 0x7ff9d4ff1700 (LWP 3605)):
Thread 15 (Thread 0x7ff9d47f0700 (LWP 3607)):
Thread 14 (Thread 0x7ff9d3fef700 (LWP 3609)):
Thread 13 (Thread 0x7ff9d37ee700 (LWP 3610)):
Thread 12 (Thread 0x7ff9d2fed700 (LWP 3613)):
Thread 11 (Thread 0x7ff9d27ec700 (LWP 3614)):
Thread 10 (Thread 0x7ff9d1feb700 (LWP 3615)):
Thread 9 (Thread 0x7ff9d17ea700 (LWP 3616)):
Thread 8 (Thread 0x7ff9d0fe9700 (LWP 3617)):
Thread 7 (Thread 0x7ff9cbfff700 (LWP 3629)):
Thread 6 (Thread 0x7ff9cb7fe700 (LWP 3643)):
Thread 5 (Thread 0x7ff9caffd700 (LWP 3644)):
Thread 4 (Thread 0x7ff9ca7fc700 (LWP 3645)):
Thread 3 (Thread 0x7ff9c9ffb700 (LWP 3646)):
Thread 2 (Thread 0x7ff9c97fa700 (LWP 3707)):
Thread 1 (Thread 0x7ffa165d28c0 (LWP 3491)):
[root@argo ~]# pstack 3492 Thread 59 (Thread 0x7ff9fb98f700 (LWP 3497)):
Thread 58 (Thread 0x7ff9fb18e700 (LWP 3500)):
Thread 57 (Thread 0x7ff9fa98d700 (LWP 3501)):
Thread 56 (Thread 0x7ff9fa18c700 (LWP 3504)):
Thread 55 (Thread 0x7ff9f998b700 (LWP 3506)):
Thread 54 (Thread 0x7ff9f918a700 (LWP 3507)):
Thread 53 (Thread 0x7ff9f8989700 (LWP 3509)):
Thread 52 (Thread 0x7ff9f8188700 (LWP 3510)):
Thread 51 (Thread 0x7ff9f7987700 (LWP 3512)):
Thread 50 (Thread 0x7ff9f7186700 (LWP 3514)):
Thread 49 (Thread 0x7ff9f6985700 (LWP 3515)):
Thread 48 (Thread 0x7ff9f6184700 (LWP 3518)):
Thread 47 (Thread 0x7ff9f5983700 (LWP 3519)):
Thread 46 (Thread 0x7ff9f5182700 (LWP 3522)):
Thread 45 (Thread 0x7ff9f4981700 (LWP 3523)):
Thread 44 (Thread 0x7ff9f4180700 (LWP 3525)):
Thread 43 (Thread 0x7ff9f397f700 (LWP 3528)):
Thread 42 (Thread 0x7ff9f317e700 (LWP 3529)):
Thread 41 (Thread 0x7ff9f297d700 (LWP 3532)):
Thread 40 (Thread 0x7ff9f217c700 (LWP 3534)):
Thread 39 (Thread 0x7ff9f197b700 (LWP 3537)):
Thread 38 (Thread 0x7ff9f117a700 (LWP 3540)):
Thread 37 (Thread 0x7ff9f0979700 (LWP 3541)):
Thread 36 (Thread 0x7ff9f0178700 (LWP 3543)):
Thread 35 (Thread 0x7ff9ef977700 (LWP 3545)):
Thread 34 (Thread 0x7ff9ef176700 (LWP 3547)):
Thread 33 (Thread 0x7ff9ee174700 (LWP 3565)):
Thread 32 (Thread 0x7ff9ed973700 (LWP 3566)):
Thread 31 (Thread 0x7ff9ed172700 (LWP 3568)):
Thread 30 (Thread 0x7ff9ec971700 (LWP 3569)):
Thread 29 (Thread 0x7ff9e7fff700 (LWP 3570)):
Thread 28 (Thread 0x7ff9e77fe700 (LWP 3572)):
Thread 27 (Thread 0x7ff9e6ffd700 (LWP 3574)):
Thread 26 (Thread 0x7ff9e67fc700 (LWP 3576)):
Thread 25 (Thread 0x7ff9e5ffb700 (LWP 3578)):
Thread 24 (Thread 0x7ff9e57fa700 (LWP 3580)):
Thread 23 (Thread 0x7ff9e4ff9700 (LWP 3582)):
Thread 22 (Thread 0x7ff9e47f8700 (LWP 3584)):
Thread 21 (Thread 0x7ff9e3ff7700 (LWP 3586)):
Thread 20 (Thread 0x7ff9e37f6700 (LWP 3588)):
Thread 19 (Thread 0x7ff9e2ff5700 (LWP 3591)):
Thread 18 (Thread 0x7ff9e27f4700 (LWP 3593)):
Thread 17 (Thread 0x7ff9e1ff3700 (LWP 3594)):
Thread 16 (Thread 0x7ff9e17f2700 (LWP 3596)):
Thread 15 (Thread 0x7ff9e0ff1700 (LWP 3598)):
Thread 14 (Thread 0x7ff9e07f0700 (LWP 3600)):
Thread 13 (Thread 0x7ff9dffef700 (LWP 3602)):
Thread 12 (Thread 0x7ff9df7ee700 (LWP 3604)):
Thread 11 (Thread 0x7ff9defed700 (LWP 3606)):
Thread 10 (Thread 0x7ff9de7ec700 (LWP 3608)):
Thread 9 (Thread 0x7ff9ddfeb700 (LWP 3611)):
Thread 8 (Thread 0x7ff9dd7ea700 (LWP 3612)):
Thread 7 (Thread 0x7ff9dcfe9700 (LWP 3659)):
Thread 6 (Thread 0x7ff9cffff700 (LWP 3665)):
Thread 5 (Thread 0x7ff9cf7fe700 (LWP 3666)):
Thread 4 (Thread 0x7ff9ceffd700 (LWP 3667)):
Thread 3 (Thread 0x7ff9ce7fc700 (LWP 3668)):
Thread 2 (Thread 0x7ff9cdffb700 (LWP 3669)):
Thread 1 (Thread 0x7ffa165d28c0 (LWP 3492)):
` Reverting to 2.0.3 fixes CPU load but occasionally some httpd processes hang
Thanks @alexskynet for putting v2.0.6 into your grinder. Sorry, that it did nothing to improve.😢
Analyzing the backtraces now.
There was one change from v2.0.3 to newer ones that involved thread creation for the h2 workers. A quick check to see if that is causing problems would be to configure your server with a fixed number of h2 workers, like in:
H2MinWorkers 25
H2MaxWorkers 25
so all workers are created at startup and no dynamic creation/desctruction of thread does happen.
Could you give this a shot?
unfortunately no change Still 100% CPU with your settings and 2.0.6
Thread 17 (Thread 0x7ff9e1ff3700 (LWP 8225)):
Thread 25 (Thread 0x7ff9e5ffb700 (LWP 8217)):
v2.0.7 released with a fix for mpm_worker
setups that could (does?) result in busy loops.
Background: the v2.0.x line had improvements to return to mpm_event
connection monitoring asap for efficiency. Unfortunately, the mpm_worker
situation was not properly accounted for.
@nono303 I mentioned worker
, but this should apply to Windows setups as well.
Time to compile and run and I'll let you know
running
Very first impression is good: no immediate lock so I cross my fingers
A giant step @icing ! It is running with no apparent problems The only thing I noticed is that httpd CPU usage raised from 6-7% to something around 10% but this is really not a problem
Thanks for your patience. Happy to hear that. Let's see what the day brings.
As to performance, I made some conservative changes to bring stability. When this version proves to be stable, I can dare to tighten the screws again somewhat.
You're doing a great job @icing
I'm happy to have been useful to help a little bit: that's called opensource!
running till now ok Starting a second test server Thank you @icing great job!
I just want to confirm that 2.0.7 works like a charm
CodeIt has already released the mod_http2-2.0.7 rpm with the fix so all the world should be happy now :-)
It has been running for several hours on two server with absolutely no troubles
Problem solved: well done @icing ! :1st_place_medal:
Thanks again @alexskynet for the help!
Closing this as fixed in v2.0.7.
… I came after the battle (quite busy now) but many Thx @icing for your work and responsiveness I compiled and running over 2.0.7 on Windows and all work like a charm 😉 No more cpu usage against this stack
libapr-1.dll!impl_pollset_poll+0xba
mod_http2.so!mplx_pollset_poll+0x154
mod_http2.so!h2_mplx_c1_poll+0x52
mod_http2.so!h2_session_process+0xca5
mod_http2.so!h2_c1_run+0xaf
mod_http2.so!h2_c1_hook_process_connection+0x4e2
libhttpd.dll!ap_run_process_connection+0x35
libhttpd.dll!worker_main+0x3a8
kernel32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x1d
just maybe a little bit more cpu cycle on mod_watchdog, but not sure is related to h2
mod_watchdog.so!wd_worker+0x24f
libapr-1.dll!dummy_worker+0x43
ucrtbase.DLL!o__realloc_base+0x60
kernel32.dll!BaseThreadInitThunk+0xd
Sorry to reopen this bt today we had hangs on two different servers both running 2.0.7 Reverting to 2.0.3 seems to fix it We runned 2.0.7 ok for nearly 4 days before the hang The interesting part of pstack seems this at a first glance: Thread 7 (Thread 0x7fdcfb7fe700 (LWP 20589)):
Thread 6 (Thread 0x7fdcf97fa700 (LWP 20593)):
@alexskynet this strack trace does not look right. I assume there is another version of mod_http2 loaded than the v2.0.7 one. v2.0.7 never invokes h2_fifo_remove()
. Assuming the APR lib symbols are correct, the apr_bucket_split
is also never called in v2.0.7 (but was in versions v2.0.4 and earlier).
Could you double check?
v2.0.8 released. The fixes are unrelated to this, but I added a assertion before the point where the 100% cpu loop seems to happen in reports by @alexskynet. It would be interesting to know if this triggers and if so, what is logged (at level critical) in such a case.
The epic battle continues with v2.0.9:
Hi @icing and thank you for your hard work
Testing 2.0.9 with worker
Till now what I see in mod_status looks OK: it has been running for 20 minutes now.
I cross my fingers and wait ...
I'll let you know
You tricked me before! I remain sceptical...😉
12 hours still running fine ...
It looks promising
Hi @icing did some testing for three days now and it works fine Five servers are running 2.0.9 and one is running master with no anomalies The battle is over and you won Many compliments for the very good work
Hi @alexskynet! This is excellent news. We won. Thank you very much for helping on this.
In the meantime, I have added more edge test cases and made some more improvements on reliability. Will release that in some days as a v2.0.10.
Thanks you!
I'll start a test as soon as possible
Il 30 settembre 2022 16:59:17 CEST, Stefan Eissing @.***> ha scritto:
The epic battle continues with v2.0.9:
- Fixed a bug where errors during reponse body handling did not lead to a proper RST_STREAM. Instead processing went into an infinite loop. Extended test cases to catch this condition.
-- Reply to this email directly or view it on GitHub: https://github.com/icing/mod_h2/issues/234#issuecomment-1263686057 You are receiving this because you were mentioned.
Message ID: @.***> SKNT Group SRLU Via Maggiate 67/a 28021 Borgomanero (NO) tel. +39 0322-836487/834765 fax +39 0322-836608 http://sknt.it
With v2.0.10 just being released and Alessandro's extensive testing, I think we have solve the issues.
Many, many thanks to everyone.
Here is pstack trace for one of the hung processes pegging cpu @ 100% - looks like some kind of deadlock
0 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
1 0x00007f2d659f0f9e in _L_unlock_738 () from /lib64/libpthread.so.0
2 0x00007f2d659f0f10 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7f2c7c5248f8) at pthread_mutex_unlock.c:55
3 __GI___pthread_mutex_unlock (mutex=0x7f2c7c5248f8) at pthread_mutex_unlock.c:330
4 0x00007f2d57cb140d in h2_beam_receive () from /etc/httpd/modules/mod_http2.so
5 0x00007f2d57cc8ee3 in buffer_output_receive () from /etc/httpd/modules/mod_http2.so
6 0x00007f2d57ccb1ec in stream_data_cb () from /etc/httpd/modules/mod_http2.so
7 0x00007f2d66cba171 in nghttp2_session_pack_data () from /lib64/libnghttp2.so.14
8 0x00007f2d66cbaedd in nghttp2_session_mem_send_internal () from /lib64/libnghttp2.so.14
9 0x00007f2d66cbbae9 in nghttp2_session_send () from /lib64/libnghttp2.so.14
10 0x00007f2d57cc7544 in h2_session_send () from /etc/httpd/modules/mod_http2.so
11 0x00007f2d57cc777a in h2_session_process () from /etc/httpd/modules/mod_http2.so
12 0x00007f2d57cb2149 in h2_c1_run () from /etc/httpd/modules/mod_http2.so
13 0x00007f2d57cb2569 in h2_c1_hook_process_connection () from /etc/httpd/modules/mod_http2.so
14 0x00005571f66c33c0 in ap_run_process_connection (c=c@entry=0x7f2d4006d5e0) at connection.c:42
15 0x00007f2d5a8ab40a in process_socket (thd=thd@entry=0x5571f72c4510, p=, sock=, cs=0x7f2d4006d530, my_child_num=my_child_num@entry=11, my_thread_num=my_thread_num@entry=12) at event.c:1086
16 0x00007f2d5a8ae6ae in worker_thread (thd=0x5571f72c4510, dummy=) at event.c:2179
17 0x00007f2d659edea5 in start_thread (arg=0x7f2d3a7f4700) at pthread_create.c:307
18 0x00007f2d65512b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111