openshift / router

Ingress controller for OpenShift
Apache License 2.0
72 stars 116 forks source link

OCPBUGS-32369: DO NOT MERGE: validate haproxy28-2.8.5-2.rhaos4.16.el9.x86_64.rpm #596

Closed frobware closed 5 months ago

frobware commented 5 months ago

/hold

openshift-ci-robot commented 5 months ago

@frobware: This pull request references Jira Issue OCPBUGS-32369, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.16.0) matches configured target version for branch (4.16.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/router/pull/596): >/hold > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Frouter). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 5 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from frobware. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift/router/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
frobware commented 5 months ago

Patch for https://pkgs.devel.redhat.com/cgit/rpms/haproxy/

From 14e2063cc2516c2482765588c684434a50edf68a Mon Sep 17 00:00:00 2001
From: Andrew McDermott <amcdermo@redhat.com>
Date: Thu, 16 May 2024 09:45:27 +0100
Subject: OCPBUGS-32369: HAProxy consuming high cpu usage

- Resolve https://issues.redhat.com/browse/OCPBUGS-32369 (HAProxy consuming high cpu usage)
- Carry fix for https://github.com/haproxy/haproxy/issues/2537
- Fix for issue 2537 picked from https://git.haproxy.org/?p=haproxy.git;a=commit;h=4a9e3e102e192b9efd17e3241a6cc659afb7e7dc
---
 ...y-only-tid-0-must-not-sleep-if-got-s.patch | 44 +++++++++++++++++++
 haproxy.spec                                  |  9 +++-
 2 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch

diff --git a/0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch b/0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch
new file mode 100644
index 0000000..69be68e
--- /dev/null
+++ b/0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch
@@ -0,0 +1,44 @@
+From 4d895b9e4bee9cdab6630203f935f98afd35ec06 Mon Sep 17 00:00:00 2001
+From: Valentine Krasnobaeva <vkrasnobaeva@haproxy.com>
+Date: Mon, 6 May 2024 14:24:41 +0200
+Subject: BUG/MINOR: haproxy: only tid 0 must not sleep if got signal
+
+This patch fixes the commit eea152ee68
+("BUG/MINOR: signals/poller: ensure wakeup from signals").
+
+There is some probability that run_poll_loop() becomes inifinite, if
+TH_FL_SLEEPING is withdrawn from all threads in the second signal_queue_len
+check, when a signal has received just after the first one.
+
+In such particular case, the 'wake' variable, which is used to terminate
+thread's poll loop is never reset to 0. So, we never enter to the "stopping"
+part of the run_poll_loop() and threads, except the one with id 0 (tid 0
+handles signals), will continue to call _do_poll() eternally and will never
+sleep, as its TH_FL_SLEEPING flag was unset.
+
+This flag needs to be removed only for the tid 0, as it was done in the first
+signal_queue_len check.
+
+This fixes an issue #2537 "infinite loop when shutting down".
+
+This fix must be backported in every stable version.
+---
+ src/haproxy.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/src/haproxy.c b/src/haproxy.c
+index f7a91b9d3..4c7754939 100644
+--- a/src/haproxy.c
++++ b/src/haproxy.c
+@@ -2983,7 +2983,7 @@ void run_poll_loop()
+           if (thread_has_tasks()) {
+               activity[tid].wake_tasks++;
+               _HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
+-          } else if (signal_queue_len) {
++          } else if (signal_queue_len && tid == 0) {
+               /* this check is required after setting TH_FL_SLEEPING to avoid
+                * a race with wakeup on signals using wake_threads() */
+               _HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
+-- 
+2.44.0
+
diff --git a/haproxy.spec b/haproxy.spec
index baac234..320877f 100644
--- a/haproxy.spec
+++ b/haproxy.spec
@@ -10,7 +10,7 @@

 Name:           haproxy
 Version:        2.8.5
-Release:        1.rhaos4.16%{?dist}
+Release:        2.rhaos4.16%{?dist}
 Summary:        Do not ship, install or use this, use %{real_name} subpackage instead

 License:        GPLv2+
@@ -19,6 +19,7 @@ URL:            http://www.haproxy.org/
 Source0:        http://www.haproxy.org/download/2.8/src/haproxy-%{version}.tar.gz

 Patch0:         0001-BUG-MAJOR-ssl_sock-Always-clear-retry-flags-in-read-.patch
+Patch1:         0001-BUG-MINOR-haproxy-only-tid-0-must-not-sleep-if-got-s.patch

 BuildRequires:  gcc
 BuildRequires:  make
@@ -53,6 +54,7 @@ availability environments. Indeed, it can:
 %prep
 %setup -q
 %patch0 -p1
+%patch1 -p1

 %build
 regparm_opts=
@@ -94,6 +96,11 @@ fi
 %{_sbindir}/%{name}

 %changelog
+* Thu May 16 2024 Andrew McDermott <amcdermo@redhat.com> - 2.8.5-2.rhaos4.16
+- Resolve https://issues.redhat.com/browse/OCPBUGS-32369 (HAProxy consuming high cpu usage)
+- Carry fix for https://github.com/haproxy/haproxy/issues/2537
+- Fix for issue 2537 picked from https://git.haproxy.org/?p=haproxy.git;a=commit;h=4a9e3e102e192b9efd17e3241a6cc659afb7e7dc
+
 * Fri Jan 05 2024 Andrew McDermott <amcdermo@redhat.com> - 2.8.5-1.rhaos4.16
 - Bump to 2.8.5
 - Delete existing carry patches related to 2.6
-- 
2.42.0
openshift-ci[bot] commented 5 months ago

@frobware: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 884d53d9caa8054b54831191e27fbfa2beea8691 link false /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-upgrade 884d53d9caa8054b54831191e27fbfa2beea8691 link true /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
gcs278 commented 5 months ago

Patch LGTM

My testing:

# Created ~/patches/OCPBUGS-32369-HAProxy-consuming-high-cpu-usage.patch with https://github.com/openshift/router/pull/596#issuecomment-2115101301

$ cd /home/gspence/src/redhat.com/haproxy
$ git checkout rhaos-4.16-rhel-9
$ git pull
[...]

$ git am ~/patches/OCPBUGS-32369-HAProxy-consuming-high-cpu-usage.patch
Applying: OCPBUGS-32369: HAProxy consuming high cpu usage
.git/rebase-apply/patch:46: space before tab in indent.
            if (thread_has_tasks()) {
.git/rebase-apply/patch:47: space before tab in indent.
                activity[tid].wake_tasks++;
.git/rebase-apply/patch:48: space before tab in indent.
                _HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
.git/rebase-apply/patch:51: space before tab in indent.
                /* this check is required after setting TH_FL_SLEEPING to avoid
.git/rebase-apply/patch:52: space before tab in indent.
                 * a race with wakeup on signals using wake_threads() */
warning: squelched 3 whitespace errors
warning: 8 lines add whitespace errors.

$ git log -n 1
commit 20764adcdf44665605e8b6ca93ff5855c6fce10a (HEAD -> rhaos-4.16-rhel-9)
Author: Andrew McDermott <amcdermo@redhat.com>
Date:   Thu May 16 09:45:27 2024 +0100

    OCPBUGS-32369: HAProxy consuming high cpu usage

    - Resolve https://issues.redhat.com/browse/OCPBUGS-32369 (HAProxy consuming high cpu usage)
    - Carry fix for https://github.com/haproxy/haproxy/issues/2537
    - Fix for issue 2537 picked from https://git.haproxy.org/?p=haproxy.git;a=commit;h=4a9e3e102e192b9efd17e3241a6cc659afb7e7dc

# reviewed diff
$ git diff HEAD~ HEAD
[...]
frobware commented 5 months ago

Change committed:

% brew latest-build rhaos-4.16-rhel-9-candidate haproxy
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
haproxy-2.8.5-2.rhaos4.16.el9             rhaos-4.16-rhel-9-candidate  amcdermo

I'm closing this PR as we only use it as a sanity / smoke test for the new RPM.

openshift-ci-robot commented 5 months ago

@frobware: This pull request references Jira Issue OCPBUGS-32369. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

In response to [this](https://github.com/openshift/router/pull/596): >/hold > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Frouter). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.