Closed c0deright closed 7 months ago
More info from the ubuntu bug report:
Most of the lingering apache2 processes have incoming TCP connections in state CLOSE_WAIT. After some time (30 mins/1h) these connections disappear from netstat
output. Apache2 /server-status
still reports that these processes have 1-3 open connections which clearly isn't the case. The state of the apache2 processes in question is "Sending Reply" respectively
"Stopping".
% ps f -o user,pid,nice,%cpu,%mem,cputime,etime,thcount,command ax
USER PID NI %CPU %MEM TIME ELAPSED THCNT COMMAND
root 906764 0 0.0 0.1 00:00:00 30:30 1 /usr/sbin/apache2 -k start
www-data 906768 0 1.1 0.2 00:00:20 30:30 3 \_ /usr/sbin/apache2 -k start <-----
www-data 906771 0 1.5 0.2 00:00:27 30:30 3 \_ /usr/sbin/apache2 -k start <-----
www-data 906772 0 0.7 0.2 00:00:14 30:30 5 \_ /usr/sbin/apache2 -k start <-----
www-data 907681 0 0.0 0.0 00:00:00 15:47 1 \_ /usr/sbin/apache2 -k start
netto 907948 0 0.9 0.7 00:00:09 15:46 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907956 0 1.2 0.7 00:00:11 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907957 0 0.5 0.7 00:00:05 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907958 0 1.6 0.7 00:00:15 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907959 0 0.9 0.7 00:00:08 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907960 0 0.7 0.6 00:00:06 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907961 0 1.1 0.7 00:00:11 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907964 0 1.2 0.7 00:00:11 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907965 0 2.0 0.7 00:00:19 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907967 0 1.6 0.7 00:00:15 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907968 0 0.7 0.7 00:00:07 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907976 0 2.0 0.7 00:00:19 15:44 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907979 0 0.1 0.6 00:00:01 15:42 1 | \_ /usr/lib/cgi-bin/php7.4
netto 908228 0 0.1 0.6 00:00:00 08:33 1 | \_ /usr/lib/cgi-bin/php7.4
netto 908230 0 0.0 0.6 00:00:00 08:33 1 | \_ /usr/lib/cgi-bin/php7.4
www-data 907683 0 0.8 0.3 00:00:07 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907684 0 1.0 0.3 00:00:10 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907685 0 0.9 0.3 00:00:08 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907687 0 2.3 0.4 00:00:21 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907688 0 1.7 0.4 00:00:16 15:46 52 \_ /usr/sbin/apache2 -k start
Apache was restarted 30 minutes and 30 seconds ago and reloaded 15 minutes and 46 seconds ago. The php child processes and the apache2 child processes at the bottom of the process list have been restarted after the graceful command - but the <-----
marked apache2 processes are still running.
Server Version: Apache/2.4.58 (Ubuntu) mod_fcgid/2.3.9 OpenSSL/3.0.2 Server MPM: event Server Built: 2023-10-25T05:39:09
Current Time: Tuesday, 20-Feb-2024 10:44:55 CET Restart Time: Tuesday, 20-Feb-2024 10:10:15 CET Parent Server Config. Generation: 2 Parent Server MPM Generation: 1 Server uptime: 34 minutes 39 seconds Server load: 0.59 0.41 0.40 Total accesses: 48447 - Total Traffic: 895.7 MB - Total Duration: 835184 CPU Usage: u118.76 s27.27 cu128 cs24.35 - 14.4% CPU load 23.3 requests/sec - 441.2 kB/second - 18.9 kB/request - 17.2391 ms/request 45 requests currently being processed, 0 workers gracefully restarting, 205 idle workers
Slot PID Stopping Connections Threads Async connections total accept busy graceful idle writing keep-alive closing 0 906768 yes (old gen) 1 no 0 0 0 0 0 0 1 907683 no 15 yes 7 0 43 0 6 1 2 907684 no 18 yes 9 0 41 0 8 3 3 906771 yes (old gen) 1 no 0 0 0 0 0 0 4 906772 yes (old gen) 3 no 0 0 0 0 0 0 5 907685 no 15 yes 5 0 45 0 7 1 6 907687 no 36 yes 11 0 39 0 20 5 7 907688 no 28 yes 13 0 37 0 11 4 Sum 8 3 117 45 0 205 0 52 14
............................................W.....R__RR__R_____R
R________________R__________________________R________________R_R
_______RR_W___R____RR_........................W.................
.............W..............W.................W..........._R____
___R___R______________R_______________R________R____R__RRR____R_
___R_R__R_____R__________R___________R______________R___R__R_R__
RRRR___R__RR__R_................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
........................................
Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process
Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost Request
0-0 906768 1/146/146 W 20.52 1181 0 1569 0.0 2.37 2.37 77.23.XXX.XX http/1.1 www.nettolohn.de:443 POST / HTTP/1.1
3-0 906771 1/180/180 W 27.46 1195 0 2123 0.0 3.53 3.53 176.4.XXX.XXX http/1.1 www.nettolohn.de:443 POST / HTTP/1.1
4-0 906772 1/103/103 W 14.30 1188 0 1163 0.0 1.78 1.78 95.91.XXX.XXX http/1.1 www.nettolohn.de:443 GET / HTTP/1.1
4-0 906772 1/101/101 W 13.71 1194 0 1161 0.0 1.80 1.80 91.65.XXX.XXX http/1.1 www.nettolohn.de:443 GET /rechner/firmenwagenrechner.html HTTP/1.1
4-0 906772 1/98/98 W 14.33 1192 0 3356 0.0 2.06 2.06 156.67.XXX.XXX http/1.1 www.nettolohn.de:443 POST / HTTP/1.1
Now let's see if the lingering apache2 processes have really TCP connections open:
% netstat -antpe | grep -P '906768|906771|906772'
tcp6 1 0 10.0.X.Y:443 91.65.XXX.XXX:58430 CLOSE_WAIT 33 115570910 906772/apache2
tcp6 1 0 10.0.X.Y:443 77.23.XXX.XX:48341 CLOSE_WAIT 33 115570504 906768/apache2
tcp6 1 0 10.0.X.Y:443 156.67.XXX.XXX:51031 CLOSE_WAIT 33 115571313 906772/apache2
tcp6 1 0 10.0.X.Y:443 95.91.XXX.XXX:4912 CLOSE_WAIT 33 115571178 906772/apache2
tcp6 1 0 10.0.X.Y:443 176.4.XXX.XXX:39367 CLOSE_WAIT 33 115570723 906771/apache2
lsof
still showing several threads in these processes having open file handles on logfiles:
% lsof -n | grep -P '(906768|906771|906772).*(TCP|log)'
apache2 906768 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906768 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906768 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906768 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906768 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906768 www-data 44u IPv6 115570504 0t0 TCP 10.0.XX.XXX:https->77.23.XXX.XXX:48341 (CLOSE_WAIT)
apache2 906768 907058 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906768 907058 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906768 907058 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906768 907058 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906768 907058 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906768 907058 apache2 www-data 44u IPv6 115570504 0t0 TCP 10.0.XX.XXX:https->77.23.XXX.XXX:48341 (CLOSE_WAIT)
apache2 906768 907069 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906768 907069 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906768 907069 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906768 907069 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906768 907069 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906768 907069 apache2 www-data 44u IPv6 115570504 0t0 TCP 10.0.XX.XXX:https->77.23.XXX.XXX:48341 (CLOSE_WAIT)
apache2 906771 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906771 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906771 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906771 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906771 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906771 www-data 63u IPv6 115570723 0t0 TCP 10.0.XX.XXX:https->176.4.XXX.XXX:39367 (CLOSE_WAIT)
apache2 906771 906944 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906771 906944 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906771 906944 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906771 906944 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906771 906944 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906771 906944 apache2 www-data 63u IPv6 115570723 0t0 TCP 10.0.XX.XXX:https->176.4.XXX.XXX:39367 (CLOSE_WAIT)
apache2 906771 906970 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906771 906970 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906771 906970 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906771 906970 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906771 906970 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906771 906970 apache2 www-data 63u IPv6 115570723 0t0 TCP 10.0.XX.XXX:https->176.4.XXX.XXX:39367 (CLOSE_WAIT)
apache2 906772 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906772 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906772 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906772 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906772 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906772 www-data 27u IPv6 115571178 0t0 TCP 10.0.XX.XXX:https->95.91.XXX.XXX:4912 (CLOSE_WAIT)
apache2 906772 www-data 39u IPv6 115571313 0t0 TCP 10.0.XX.XXX:https->156.67.XXX.XXX:51031 (CLOSE_WAIT)
apache2 906772 www-data 47u IPv6 115570910 0t0 TCP 10.0.XX.XXX:https->91.65.XXX.XXX:58430 (CLOSE_WAIT)
apache2 906772 906813 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906772 906813 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906772 906813 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906772 906813 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906772 906813 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906772 906813 apache2 www-data 27u IPv6 115571178 0t0 TCP 10.0.XX.XXX:https->95.91.XXX.XXX:4912 (CLOSE_WAIT)
apache2 906772 906813 apache2 www-data 39u IPv6 115571313 0t0 TCP 10.0.XX.XXX:https->156.67.XXX.XXX:51031 (CLOSE_WAIT)
apache2 906772 906813 apache2 www-data 47u IPv6 115570910 0t0 TCP 10.0.XX.XXX:https->91.65.XXX.XXX:58430 (CLOSE_WAIT)
apache2 906772 906842 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906772 906842 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906772 906842 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906772 906842 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906772 906842 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906772 906842 apache2 www-data 27u IPv6 115571178 0t0 TCP 10.0.XX.XXX:https->95.91.XXX.XXX:4912 (CLOSE_WAIT)
apache2 906772 906842 apache2 www-data 39u IPv6 115571313 0t0 TCP 10.0.XX.XXX:https->156.67.XXX.XXX:51031 (CLOSE_WAIT)
apache2 906772 906842 apache2 www-data 47u IPv6 115570910 0t0 TCP 10.0.XX.XXX:https->91.65.XXX.XXX:58430 (CLOSE_WAIT)
apache2 906772 906877 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906772 906877 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906772 906877 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906772 906877 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906772 906877 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906772 906877 apache2 www-data 27u IPv6 115571178 0t0 TCP 10.0.XX.XXX:https->95.91.XXX.XXX:4912 (CLOSE_WAIT)
apache2 906772 906877 apache2 www-data 39u IPv6 115571313 0t0 TCP 10.0.XX.XXX:https->156.67.XXX.XXX:51031 (CLOSE_WAIT)
apache2 906772 906877 apache2 www-data 47u IPv6 115570910 0t0 TCP 10.0.XX.XXX:https->91.65.XXX.XXX:58430 (CLOSE_WAIT)
apache2 906772 906889 apache2 www-data 2w REG 259,3 13551 263809 /var/log/apache2/error.log
apache2 906772 906889 apache2 www-data 3u sock 0,8 0t0 115491219 protocol: TCP
apache2 906772 906889 apache2 www-data 5u sock 0,8 0t0 115491223 protocol: TCP
apache2 906772 906889 apache2 www-data 10w REG 259,2 26081587 110129 /home/netto/www.nettolohn.de/www_log/error_log
apache2 906772 906889 apache2 www-data 11w REG 259,3 108695678 263712 /var/log/apache2/access.log
apache2 906772 906889 apache2 www-data 27u IPv6 115571178 0t0 TCP 10.0.XX.XXX:https->95.91.XXX.XXX:4912 (CLOSE_WAIT)
apache2 906772 906889 apache2 www-data 39u IPv6 115571313 0t0 TCP 10.0.XX.XXX:https->156.67.XXX.XXX:51031 (CLOSE_WAIT)
apache2 906772 906889 apache2 www-data 47u IPv6 115570910 0t0 TCP 10.0.XX.XXX:https->91.65.XXX.XXX:58430 (CLOSE_WAIT)
Eventually the TCP connections in CLOSE_WAIT will disappear from netstat
and lsof
but the open file handles will remain with the lingering apache2 processes causing a denial-of-service after the server the reaches the configured max process slots or the log volume fills up because of open file handles.
It's confirmed to be even an issue with non of your packages installed on the system.
Will close this issue.
Frequently asked questions
Describe the bug
We're running stock Ubuntu LTS 22.04 jammy with apache2 version
2.4.52-1ubuntu4.7
and PHP 7.4 packages from your repo.We experience an unusual issue which is described in detail in this bug report: Ubuntu Bug #2054301: [worker/event] Some child processes aren't killed on reload (graceful), being stuck in state "Sending Reply", leaving old logfiles open, disk fills up, server stops responding after some days, DoS.
TL;DR
What should happen
Apache2 runs php 7.4-cgi via mod_fcgid and when apache2 reloads it's configuration and closes and reopens all logfiles (eighter via
apache2ctl graceful
orservice apache2 reload
or via sending SIGUSR1 to the father apache2 process or viasystemctl reload apache2
) all but the father apache2 process should be exiting and new apache2 child processes are started by the father apache2 process.What happens instead
Some of the apache2 child processes (not php-cgi) keep running, leaving file handles open on log files and don't accept incoming connections. Over time, at least every 24 hours - when logrotate runs and calls
apache2ctl graceful
- more and more lingering apache2 processes keep piling up, which will cause a Denial of Service after some time:/var/log
volume is filling up since deleted log files are still being held open via file handles (and thus still consume disk space since they aren't released yet). The tooldf
(disk free) will show gigabytes being used on /var/log volume despitedu
(disk usage) will only show some megabytes being used by files in/var/log
.The old apache2 child processes will eventually consume all process slots until they reach the apache2 configured maximum, leaving no apache2 process accepting connections, eventually causing a Denial-of-Service.
Ubuntu developers are indicating that this might be caused by your PHP 7.4 packages because it seems the bug report is the first of this kind. We have projects running on other jammy systems with your PHP 5.6 projects with no issues and servers on jammy systems with your PHP 7.4 packages which had this issue only with MPM worker - after switching to MPM event it disappeared. But all these other systems experience only a fraction of traffic the webserver gets where we have the issue describled in this issue - with MPM worker AND MPM event.
What steps did you take to resolve issue yourself before reporting it here
2.4.58-1+ubuntu22.04.1+deb.sury.org+1
instead of2.4.52-1ubuntu4.7
). No difference.Expected behavior No apache2 child processes running forever, not accepting connections, keeping logfiles open after logrotate runs.
Distribution
deb https://ppa.launchpadcontent.net/ondrej/php/ubuntu/ jammy main
Package(s)