Closed mkowalski closed 1 week ago
/cc @cybertron /cc @akrzos
/retest-required /test e2e-openstack
This can't possibly have broken most of those jobs. Let's try again.
Oh, and /lgtm
@mkowalski: The following tests failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/e2e-vsphere-ovn-zones | 9092db2001470cd13b2a5d429930182d436dce2f | link | false | /test e2e-vsphere-ovn-zones |
ci/prow/e2e-azure-ovn-upgrade-out-of-change | 9092db2001470cd13b2a5d429930182d436dce2f | link | false | /test e2e-azure-ovn-upgrade-out-of-change |
ci/prow/e2e-vsphere-ovn-upi | 9092db2001470cd13b2a5d429930182d436dce2f | link | false | /test e2e-vsphere-ovn-upi |
Full PR test history. Your PR dashboard.
/retitle OCPBUGS-38490: Increase connection limit for cluster loadbalancer
@mkowalski: This pull request references Jira Issue OCPBUGS-38490, which is invalid:
Comment /jira refresh
to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
The bug has been updated to refer to the pull request using the external bug tracker.
/jira refresh
@mkowalski: This pull request references Jira Issue OCPBUGS-38490, which is valid. The bug has been moved to the POST state.
Requesting review from QA contact: /cc @sergiordlr
Pre-merge verified: Build the image using clusterbot and deployed the cluster using template private-templates/functionality-testing/aos-4_16/ipi-on-baremetal/versioned-installer-packet_libvirt-bootstrap_static-ci
$ oc get clusterversions
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.17.0-0.ci.test-2024-08-29-102444-ci-ln-bzpchy2-latest True False 109m Cluster version is 4.17.0-0.ci.test-2024-08-29-102444-ci-ln-bzpchy2-latest
$ oc -n openshift-kni-infra rsh haproxy-master-0
Defaulted container "haproxy" out of: haproxy, haproxy-monitor, verify-api-int-resolvable (init)
sh-5.1$ cat /etc/haproxy/haproxy.cfg
global
stats socket /var/lib/haproxy/run/haproxy.sock mode 600 level admin expose-fd listeners
defaults
maxconn 40000
mode tcp
We can see that maxconn is 40k here Adding label qe-approved /label qe-approved
@mkowalski: This pull request references Jira Issue OCPBUGS-38490, which is valid.
Requesting review from QA contact: /cc @sergiordlr
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: cybertron, mkowalski, yuqi-zhang
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/retest-required
Remaining retests: 0 against base HEAD b6d267419cac5c8a580836d7d3555b8006faeb40 and 2 for PR HEAD 9092db2001470cd13b2a5d429930182d436dce2f in total
@mkowalski: Jira Issue OCPBUGS-38490: All pull requests linked via external trackers have merged:
Jira Issue OCPBUGS-38490 has been moved to the MODIFIED state.
[ART PR BUILD NOTIFIER]
Distgit: ose-machine-config-operator This PR has been included in build ose-machine-config-operator-container-v4.18.0-202410230310.p0.g54144b3.assembly.stream.el9. All builds following this will include this PR.
It has been reported that for certain deployments in hub-spoke topology (a single metal cluster with 3500+ managed clusters attached), during upgrades the current limit of 20k connections to the loadbalancer is not big enough and clusters are reporting connection timeouts.
As the current limit of 20k is an arbitrary selected number and the tests report that increasing it to 40k does help for the scenario described above, we should increase the current default limit.
This PR does not change the overall recommendation to use external enterprise-grade loadbalancer for such a resource-consuming workload.