Closed pbertin closed 1 year ago
I did find one of our load balancers was not set for persistent sessions, but that was the one in Southeast Asia. I just fixed that setting. All other load balancers are set for persistent sessions keyed on the client IP.
If you can share the public IP of your system, I can check which actual nodes it was hitting.
I have an instance currently experiencing the issue in the "northeurope" Azure region Its public IP is 13.94.94.43 The replica inconsistency can for example be seen when looping over: curl -Iv http://olcentgbl.trafficmanager.net/centos/7/updates/x86_64/repodata/filelists.sqlite.bz2
This hostname resolves alternately to two different IPs, which return different set of results:
Connected to olcentgbl.trafficmanager.net (168.63.67.169) port 80 (#0) Last-Modified: Tue, 24 Jan 2023 07:56:39 GMT Content-Length: 11114858
Connected to olcentgbl.trafficmanager.net (20.54.32.85) port 80 (#0) Last-Modified: Tue, 24 Jan 2023 08:03:29 GMT Content-Length: 11115537
Let me know if I can be of further help, of course
Thank you @pbertin !
Using this info, I was able to determine that your system was hitting one of our older nodes and one of our newer nodes.
As a little background, our load balancers are nested into two tiers: global and regional.
We've been adding new nodes to our repos and the 1st tier was not maintaining Client IP persistence between the regional load balancer for the old nodes and the regional load balancer for the new nodes. Since both old and new nodes were in the same region, the 1st tier would return each of them in a round-robin fashion.
This should be fixed now and the global tier load balancer should no longer return differing repo nodes upon subsequent connections.
For the last couple days, we have been observing update failures on OpenLogic:CentOS:7_9-gen2 instances, which apparently come from inconsistencies between repository replicas
"yum update" or "yum makecache" would most of the time fail with error messages similar to:
Indeed, fetching those files in a loop show different versions being alternatively served from the load balancer endpoint:
This has been observed at least in the "northeurope" and "francecentral" Azure regions