Closed tmsdce closed 2 months ago
bump...
This is causing some clusters to hang due to high CPU usage and memory leaking. We have to regularly rollout calico daemonset to free up some resources but this is not a suitable workaround. Embedding fixed versions of calico should be enough to resolve the issue.
Maybe @jiaqiluo or @kinarashah can have a look at this ?
Hi @tmsdce, thank you for reporting the issue. Our team will follow up.
when is this going to be released, this is a production critical bug
hi @wzrdtales, please follow the linked issues in the rancher/rancher repo for milestones and progress.
Thanks, I just thought SUSE is a little more sensible to issues that are hitting their users' production systems and causing downtimes.
The worst is, there is no easy way to upgrade these calico versions without rke integrating it, as far as I could see.
I strongly recommend using the support channel to emphasize the urgency of fixing the bug. This will help ensure that the product management team prioritizes the issue.
@wzrdtales as workaround you can change your container images to 3.28.1 in the deployment and daemonset.
Validated the calico version bump as a part of KDM August patch testing:
Fresh install:
Upgrade checks:
v2.8 KDM August patch testing also passed successfully.
@wzrdtales as workaround you can change your container images to 3.28.1 in the deployment and daemonset.
well the operator does reset the image tag version upon change again (at least with rke2)
@wzrdtales You can override the system images used by RKE in your rke config file : https://rke.docs.rancher.com/config-options/system-images
If you adapt the calico related images, you should be able to fix the issue while waiting for a new RKE release
this is the docs for rke1, not rke2
The bug was opened for RKE1 so I thought you were using RKE1. I understand the issue is also valid for RKE2 My mistake
Hi @wzrdtales, I did a quick search and found the following info:
1/ Calico 3.28.1, which contains the fix for this issue, is used in the following RKE2 versions:
2/ Those RKE2 versions will be available in Rancher v2.8.x and v2.9.x release lines once 2.8.8 and 2.9.2 are released , please check the links for more details:
2.8 and 2.9 validations have been done as part of the associated issues. https://github.com/rancher/rancher/issues/47024#issuecomment-2352694271 https://github.com/rancher/rancher/issues/47046#issuecomment-2348781406 Closing this issue.
Hi @jiaqiluo
I see the issue is resolved referencing only Rancher/RKE2 fixes. Any ETA for a fix for RKE1 ?
@tmsdce Will be released with 2.8.8 and 2.9.2 as you can see on the milestones of the associated tickets. RKE1 bump validation was completed with the fix and that is why this ticket is closed. https://github.com/rancher/rke/issues/3648#issuecomment-2354765374
Ok, thanks for your quick reply @mitulshah-suse
@jiaqiluo @mitulshah-suse 3.28.1 is broken as well if kubernetes endpoint is configured, there is a fix for that in 3.28.2
This issue is related to this Calico issue : https://github.com/projectcalico/calico/issues/8856 Everything is explained in the issue thread but here's a quick view of the logs we're seeing in calico which causes high CPU usage
Versions
1.29.X
and1.30.X
deployed by RKE use respectively Calico3.27.3
and3.28.0
which are concerned by the above issue. The bug was fixed in versions3.27.4
and3.28.1
.Can you cut a new release for RKE including these fixed versions of Calico ?
RKE version:
Docker version: (
docker version
,docker info
preferred)Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)