Closed frobware closed 5 months ago
@frobware: This pull request references Jira Issue OCPBUGS-32369, which is valid. The bug has been moved to the POST state.
Requesting review from QA contact: /cc @lihongan
The bug has been updated to refer to the pull request using the external bug tracker.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from frobware. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Patch for https://pkgs.devel.redhat.com/cgit/rpms/haproxy/
From 14e2063cc2516c2482765588c684434a50edf68a Mon Sep 17 00:00:00 2001
From: Andrew McDermott <amcdermo@redhat.com>
Date: Thu, 16 May 2024 09:45:27 +0100
Subject: OCPBUGS-32369: HAProxy consuming high cpu usage
- Resolve https://issues.redhat.com/browse/OCPBUGS-32369 (HAProxy consuming high cpu usage)
- Carry fix for https://github.com/haproxy/haproxy/issues/2537
- Fix for issue 2537 picked from https://git.haproxy.org/?p=haproxy.git;a=commit;h=4a9e3e102e192b9efd17e3241a6cc659afb7e7dc
---
...y-only-tid-0-must-not-sleep-if-got-s.patch | 44 +++++++++++++++++++
haproxy.spec | 9 +++-
2 files changed, 52 insertions(+), 1 deletion(-)
create mode 100644 0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch
diff --git a/0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch b/0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch
new file mode 100644
index 0000000..69be68e
--- /dev/null
+++ b/0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch
@@ -0,0 +1,44 @@
+From 4d895b9e4bee9cdab6630203f935f98afd35ec06 Mon Sep 17 00:00:00 2001
+From: Valentine Krasnobaeva <vkrasnobaeva@haproxy.com>
+Date: Mon, 6 May 2024 14:24:41 +0200
+Subject: BUG/MINOR: haproxy: only tid 0 must not sleep if got signal
+
+This patch fixes the commit eea152ee68
+("BUG/MINOR: signals/poller: ensure wakeup from signals").
+
+There is some probability that run_poll_loop() becomes inifinite, if
+TH_FL_SLEEPING is withdrawn from all threads in the second signal_queue_len
+check, when a signal has received just after the first one.
+
+In such particular case, the 'wake' variable, which is used to terminate
+thread's poll loop is never reset to 0. So, we never enter to the "stopping"
+part of the run_poll_loop() and threads, except the one with id 0 (tid 0
+handles signals), will continue to call _do_poll() eternally and will never
+sleep, as its TH_FL_SLEEPING flag was unset.
+
+This flag needs to be removed only for the tid 0, as it was done in the first
+signal_queue_len check.
+
+This fixes an issue #2537 "infinite loop when shutting down".
+
+This fix must be backported in every stable version.
+---
+ src/haproxy.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/src/haproxy.c b/src/haproxy.c
+index f7a91b9d3..4c7754939 100644
+--- a/src/haproxy.c
++++ b/src/haproxy.c
+@@ -2983,7 +2983,7 @@ void run_poll_loop()
+ if (thread_has_tasks()) {
+ activity[tid].wake_tasks++;
+ _HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
+- } else if (signal_queue_len) {
++ } else if (signal_queue_len && tid == 0) {
+ /* this check is required after setting TH_FL_SLEEPING to avoid
+ * a race with wakeup on signals using wake_threads() */
+ _HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
+--
+2.44.0
+
diff --git a/haproxy.spec b/haproxy.spec
index baac234..320877f 100644
--- a/haproxy.spec
+++ b/haproxy.spec
@@ -10,7 +10,7 @@
Name: haproxy
Version: 2.8.5
-Release: 1.rhaos4.16%{?dist}
+Release: 2.rhaos4.16%{?dist}
Summary: Do not ship, install or use this, use %{real_name} subpackage instead
License: GPLv2+
@@ -19,6 +19,7 @@ URL: http://www.haproxy.org/
Source0: http://www.haproxy.org/download/2.8/src/haproxy-%{version}.tar.gz
Patch0: 0001-BUG-MAJOR-ssl_sock-Always-clear-retry-flags-in-read-.patch
+Patch1: 0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch
BuildRequires: gcc
BuildRequires: make
@@ -53,6 +54,7 @@ availability environments. Indeed, it can:
%prep
%setup -q
%patch0 -p1
+%patch1 -p1
%build
regparm_opts=
@@ -94,6 +96,11 @@ fi
%{_sbindir}/%{name}
%changelog
+* Thu May 16 2024 Andrew McDermott <amcdermo@redhat.com> - 2.8.5-2.rhaos4.16
+- Resolve https://issues.redhat.com/browse/OCPBUGS-32369 (HAProxy consuming high cpu usage)
+- Carry fix for https://github.com/haproxy/haproxy/issues/2537
+- Fix for issue 2537 picked from https://git.haproxy.org/?p=haproxy.git;a=commit;h=4a9e3e102e192b9efd17e3241a6cc659afb7e7dc
+
* Fri Jan 05 2024 Andrew McDermott <amcdermo@redhat.com> - 2.8.5-1.rhaos4.16
- Bump to 2.8.5
- Delete existing carry patches related to 2.6
--
2.42.0
@frobware: The following tests failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/e2e-metal-ipi-ovn-ipv6 | 884d53d9caa8054b54831191e27fbfa2beea8691 | link | false | /test e2e-metal-ipi-ovn-ipv6 |
ci/prow/e2e-upgrade | 884d53d9caa8054b54831191e27fbfa2beea8691 | link | true | /test e2e-upgrade |
Full PR test history. Your PR dashboard.
Patch LGTM
My testing:
# Created ~/patches/OCPBUGS-32369-HAProxy-consuming-high-cpu-usage.patch with https://github.com/openshift/router/pull/596#issuecomment-2115101301
$ cd /home/gspence/src/redhat.com/haproxy
$ git checkout rhaos-4.16-rhel-9
$ git pull
[...]
$ git am ~/patches/OCPBUGS-32369-HAProxy-consuming-high-cpu-usage.patch
Applying: OCPBUGS-32369: HAProxy consuming high cpu usage
.git/rebase-apply/patch:46: space before tab in indent.
if (thread_has_tasks()) {
.git/rebase-apply/patch:47: space before tab in indent.
activity[tid].wake_tasks++;
.git/rebase-apply/patch:48: space before tab in indent.
_HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
.git/rebase-apply/patch:51: space before tab in indent.
/* this check is required after setting TH_FL_SLEEPING to avoid
.git/rebase-apply/patch:52: space before tab in indent.
* a race with wakeup on signals using wake_threads() */
warning: squelched 3 whitespace errors
warning: 8 lines add whitespace errors.
$ git log -n 1
commit 20764adcdf44665605e8b6ca93ff5855c6fce10a (HEAD -> rhaos-4.16-rhel-9)
Author: Andrew McDermott <amcdermo@redhat.com>
Date: Thu May 16 09:45:27 2024 +0100
OCPBUGS-32369: HAProxy consuming high cpu usage
- Resolve https://issues.redhat.com/browse/OCPBUGS-32369 (HAProxy consuming high cpu usage)
- Carry fix for https://github.com/haproxy/haproxy/issues/2537
- Fix for issue 2537 picked from https://git.haproxy.org/?p=haproxy.git;a=commit;h=4a9e3e102e192b9efd17e3241a6cc659afb7e7dc
# reviewed diff
$ git diff HEAD~ HEAD
[...]
Change committed:
% brew latest-build rhaos-4.16-rhel-9-candidate haproxy
Build Tag Built by
---------------------------------------- -------------------- ----------------
haproxy-2.8.5-2.rhaos4.16.el9 rhaos-4.16-rhel-9-candidate amcdermo
I'm closing this PR as we only use it as a sanity / smoke test for the new RPM.
@frobware: This pull request references Jira Issue OCPBUGS-32369. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.
/hold