Closed gcs278 closed 6 months ago
@gcs278: This pull request references Jira Issue OCPBUGS-29690, which is invalid:
Comment /jira refresh
to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
The bug has been updated to refer to the pull request using the external bug tracker.
/assign @frobware
/assign
If you have a route that specifies spec.port
and spec.alternateBackends
with one service, and the alternate backend service has a port that doesn't match spec.port.to
, then do we erroneously use roundrobin
instead of random
? Do we also need to modify createServiceAliasConfig
to filter out service units without matching ports when calculating activeServiceUnits
?
If you have a route that specifies
spec.port
andspec.alternateBackends
with one service, and the alternate backend service has a port that doesn't matchspec.port.to
, then do we erroneously useroundrobin
instead ofrandom
? Do we also need to modifycreateServiceAliasConfig
to filter out service units without matching ports when calculatingactiveServiceUnits
?
@Miciah Yes, just tried it out. You are correct indeed. Conversely, if you have a route and the spec.to
service has a port that doesn't match spec.port.to
, but the alternate backend has a valid port, it sets it to roundrobin
erroneously as well. It's the same issue with createServiceAliasConfig
. I can file a bug with a reproducer.
/retest-required
/lgtm
/jira refresh
@gcs278: This pull request references Jira Issue OCPBUGS-29690, which is valid. The bug has been moved to the POST state.
Requesting review from QA contact: /cc @ShudiLi
/approve /hold I want to get some confirmation from the customer that is running the 4.12 proposed change
/retest
Do we need to resolve https://github.com/openshift/router/pull/576#issuecomment-2065372101 in this PR too?
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: frobware
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Do we need to resolve #576 (comment) in this PR too?
I don't think so, seems like a separate bug as I don't think they have dependency between each other. In some situations with ports on service that don't match route's spec.port.to
, you'll get roundrobin
in the HaProxy config file with only 1 server line, when we actually want random
. With #576, we scale the weights of 1 "active" endpoints to weight 1
. If you only have 1 server line weight > 1 doesn't matter, regardless what the balance algorithm is (except that random uses a lot more memory...).
In other words, I don't see any harm in changing a single server backend from weight 256
to weight 1
with using roundrobin
, it's a No-Op.
The weights were changed to 1
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.0-0.ci.test-2024-04-23-081430-ci-ln-fgiq082-latest True False 28m Cluster version is 4.16.0-0.ci.test-2024-04-23-081430-ci-ln-fgiq082-latest
2.
% oc get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
reencrypt1 reencrypt1-default.apps.ci-ln-fgiq082-72292.origin-ci-int-gce.dev.rhcloud.com service-secure https reencrypt None
service-unsecure service-unsecure-default.apps.ci-ln-fgiq082-72292.origin-ci-int-gce.dev.rhcloud.com service-unsecure http None
3.
% oc -n openshift-ingress rsh router-default-7dd68dd75f-jjtlb
sh-5.1$ grep -Rn weight haproxy.config
127: server fe_sni unix@/var/lib/haproxy/run/haproxy-sni.sock weight 1 send-proxy
169: server fe_no_sni unix@/var/lib/haproxy/run/haproxy-no-sni.sock weight 1 send-proxy
227: server pod:web-server-rc-nnbf8:service-secure:https:10.131.0.14:8443 10.131.0.14:8443 cookie 3be72d2f6b0917638bbddc5a4d310a6a weight 1 ssl verifyhost service-secure.default.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt
245: server pod:web-server-rc-nnbf8:service-unsecure:http:10.131.0.14:8080 10.131.0.14:8080 cookie bf59e111822d7c46847171b9fe117bd4 weight 1
253: server pod:oauth-openshift-6d759cf-ngtlk:oauth-openshift:https:10.128.0.60:6443 10.128.0.60:6443 weight 1 check inter 5000ms
254: server pod:oauth-openshift-6d759cf-tx6fw:oauth-openshift:https:10.129.0.62:6443 10.129.0.62:6443 weight 1 check inter 5000ms
255: server pod:oauth-openshift-6d759cf-tsxhm:oauth-openshift:https:10.130.0.76:6443 10.130.0.76:6443 weight 1 check inter 5000ms
274: server pod:console-6967998f67-p56d7:console:https:10.128.0.64:8443 10.128.0.64:8443 cookie 8679ace67e2fa40ee6c30f71b8d6788f weight 1 ssl verifyhost console.openshift-console.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
275: server pod:console-6967998f67-mr7c9:console:https:10.129.0.68:8443 10.129.0.68:8443 cookie 370fef5aeed56e1697df5ffc07859b86 weight 1 ssl verifyhost console.openshift-console.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
293: server pod:downloads-cdb6f8949-6mgjk:downloads:http:10.128.0.42:8080 10.128.0.42:8080 cookie 9b7c18f36c60c4656fb144de59d5b2a3 weight 1 check inter 5000ms
294: server pod:downloads-cdb6f8949-lhgck:downloads:http:10.130.0.62:8080 10.130.0.62:8080 cookie ac2f8ba61d894025df89059782bc1968 weight 1 check inter 5000ms
312: server pod:ingress-canary-spdv5:ingress-canary:8080-tcp:10.128.2.11:8080 10.128.2.11:8080 cookie a275d1f255efa0180a7ea1f40e79c48c weight 1 check inter 5000ms
313: server pod:ingress-canary-8wg9n:ingress-canary:8080-tcp:10.129.2.7:8080 10.129.2.7:8080 cookie 1dd0c54b43aa597e9c4a84dc786204ab weight 1 check inter 5000ms
314: server pod:ingress-canary-qjbdz:ingress-canary:8080-tcp:10.131.0.8:8080 10.131.0.8:8080 cookie 401cc917bfb3e9a7b212f1406777f231 weight 1 check inter 5000ms
332: server pod:alertmanager-main-0:alertmanager-main:web:10.128.2.22:9095 10.128.2.22:9095 cookie 359fec7c5c15177e0aec87fe172a0b9c weight 1 ssl verifyhost alertmanager-main.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
333: server pod:alertmanager-main-1:alertmanager-main:web:10.131.0.13:9095 10.131.0.13:9095 cookie 5dcb2e178c26b447649da3439ed7484b weight 1 ssl verifyhost alertmanager-main.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
351: server pod:prometheus-k8s-1:prometheus-k8s:web:10.128.2.21:9091 10.128.2.21:9091 cookie 9bcaae492092f3606e7ae8848a9fe3d7 weight 1 ssl verifyhost prometheus-k8s.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
352: server pod:prometheus-k8s-0:prometheus-k8s:web:10.129.2.12:9091 10.129.2.12:9091 cookie 9c5b8f6d0b9d1ff464d16c9a6f2f0a83 weight 1 ssl verifyhost prometheus-k8s.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
370: server pod:prometheus-k8s-1:prometheus-k8s:web:10.128.2.21:9091 10.128.2.21:9091 cookie 9bcaae492092f3606e7ae8848a9fe3d7 weight 1 ssl verifyhost prometheus-k8s.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
371: server pod:prometheus-k8s-0:prometheus-k8s:web:10.129.2.12:9091 10.129.2.12:9091 cookie 9c5b8f6d0b9d1ff464d16c9a6f2f0a83 weight 1 ssl verifyhost prometheus-k8s.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
389: server pod:thanos-querier-5c89b7859-b9dkj:thanos-querier:web:10.128.2.17:9091 10.128.2.17:9091 cookie 95776b0f70ae90779ad25efd8cc21ad5 weight 1 ssl verifyhost thanos-querier.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
390: server pod:thanos-querier-5c89b7859-dwsp5:thanos-querier:web:10.131.0.11:9091 10.131.0.11:9091 cookie 76f5757f419b4a722d6065673971b937 weight 1 ssl verifyhost thanos-querier.openshift-monitoring.svc verify required ca-file /var/run/configmaps/service-ca/service-ca.crt check inter 5000ms
/label qe-approved thanks
@gcs278: This pull request references Jira Issue OCPBUGS-29690, which is valid.
Requesting review from QA contact: /cc @ShudiLi
The weight is 256 which is larger than the configured
backend be_http:e2e-test-router-env-n6zjv:service-unsecure01
mode http
option redispatch
option forwardfor
balance roundrobin
timeout check 5000ms
http-request add-header X-Forwarded-Host %[req.hdr(host)]
http-request add-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
http-request add-header X-Forwarded-Proto https if { ssl_fc }
http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
cookie e393e394aa38b3591557660f1e226508 insert indirect nocache httponly
server pod:web-server-rc01-dll2p:service-unsecure01:http:10.131.0.27:8080 10.131.0.27:8080 cookie 0ea43df433f7c28f54614f5e49bbee26 weight 256 check inter 5000ms
server pod:web-server-rc02-pc96q:service-unsecure02:http:10.131.0.28:8080 10.131.0.28:8080 cookie 1edbf08e519de047bb6e49c7901b78aa weight 256 check inter 5000ms
server pod:web-server-rc03-xxjcg:service-unsecure03:http:10.131.0.29:8080 10.131.0.29:8080 cookie 2b78972471dc2752949b8fa0be506378 weight 256 check inter 5000ms
The weight is 256 which is larger than the configured
backend be_http:e2e-test-router-env-n6zjv:service-unsecure01 mode http option redispatch option forwardfor balance roundrobin timeout check 5000ms http-request add-header X-Forwarded-Host %[req.hdr(host)] http-request add-header X-Forwarded-Port %[dst_port] http-request add-header X-Forwarded-Proto http if !{ ssl_fc } http-request add-header X-Forwarded-Proto https if { ssl_fc } http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 } http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)] cookie e393e394aa38b3591557660f1e226508 insert indirect nocache httponly server pod:web-server-rc01-dll2p:service-unsecure01:http:10.131.0.27:8080 10.131.0.27:8080 cookie 0ea43df433f7c28f54614f5e49bbee26 weight 256 check inter 5000ms server pod:web-server-rc02-pc96q:service-unsecure02:http:10.131.0.28:8080 10.131.0.28:8080 cookie 1edbf08e519de047bb6e49c7901b78aa weight 256 check inter 5000ms server pod:web-server-rc03-xxjcg:service-unsecure03:http:10.131.0.29:8080 10.131.0.29:8080 cookie 2b78972471dc2752949b8fa0be506378 weight 256 check inter 5000ms
For roundobin this is OK - no huge memory growth comes from a combination of roundrobin x 256.
/approve /hold I want to get some confirmation from the customer that is running the 4.12 proposed change
We had confirmation that the 4.12 PR works for the customer. Removing the /hold.
/hold cancel
@gcs278: all tests passed!
Full PR test history. Your PR dashboard.
@gcs278: Jira Issue OCPBUGS-29690: Some pull requests linked via external trackers have merged:
The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh
.
Jira Issue OCPBUGS-29690 has not been moved to the MODIFIED state.
/cherry-pick release-4.15
@frobware: new pull request created: #586
[ART PR BUILD NOTIFIER]
This PR has been included in build ose-haproxy-router-base-container-v4.17.0-202404251439.p0.g8762aa4.assembly.stream.el9 for distgit ose-haproxy-router-base. All builds following this will include this PR.
/jira refresh
@Miciah: Jira Issue OCPBUGS-29690: All pull requests linked via external trackers have merged:
Jira Issue OCPBUGS-29690 has been moved to the MODIFIED state.
The previous logic, which reduced the weight to 1 when there was only one service, didn't filter out inactive services. If there's only one active service, there's no need for a weight greater than 1 because traffic is directed only to active services.
Additionally, the template logic correctly accounted for active services when setting the algorithm, resulting in "random" being set in situations where our logic didn't reduce the active service's weight to 1.
While this isn't inherently problematic, using the "random" algorithm with higher weights significantly increases memory usage on HaProxy startup. This can lead to excessive memory usage in scenarios where a user has many inactive services or routes backends using weight 0.