Closed gprossliner closed 8 months ago
Log (removed some noise not relevant here)
1: Loaded plugins: fastestmirror, product-id, search-disabled-repos, subscription-
2: : manager
3: Loading mirror speeds from cached hostfile
4:
5: docker-buildx-plugin.x86_64 0.12.1-1.el7 docker-ce-stable
6: docker-ce.x86_64 3:25.0.1-1.el7 docker-ce-stable
7: docker-ce-cli.x86_64 1:25.0.1-1.el7 docker-ce-stable
8: docker-ce-rootless-extras.x86_64 25.0.1-1.el7 docker-ce-stable
9: docker-compose-plugin.x86_64 2.24.2-1.el7 docker-ce-stable
10: kernel.x86_64 3.10.0-1160.108.1.el7 WD_CentOS_7_Linux_x86_64_CentOS_7_Updates_x86_64
11: kernel-tools.x86_64 3.10.0-1160.108.1.el7 WD_CentOS_7_Linux_x86_64_CentOS_7_Updates_x86_64
12: kernel-tools-libs.x86_64 3.10.0-1160.108.1.el7 WD_CentOS_7_Linux_x86_64_CentOS_7_Updates_x86_64
...
37: SUSS: synchronize
38: % Total % Received % Xferd Average Speed Time Time Time Current
39: Dload Upload Total Spent Left Speed
40:
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 % Total % Received % Xferd Average Speed Time Time Time Current
41: Dload Upload Total Spent Left Speed
42:
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
43: SUSS: teardown
44: % Total % Received % Xferd Average Speed Time Time Time Current
45: Dload Upload Total Spent Left Speed
46:
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
47: Loaded plugins: fastestmirror, product-id, search-disabled-repos, subscription-
48: : manager
49: Loading mirror speeds from cached hostfile
50:
100 137 0 137 0 0 116 0 --:--:-- 0:00:01 --:--:-- 116Resolving Dependencies
51: --> Running transaction check
52: ---> Package docker-buildx-plugin.x86_64 0:0.11.2-1.el7 will be updated
53: ---> Package docker-buildx-plugin.x86_64 0:0.12.1-1.el7 will be an update
54: ---> Package docker-ce.x86_64 3:24.0.7-1.el7 will be updated
...
121:
122: ================================================================================
123: Package Arch Version Repository Size
124: ================================================================================
125: Installing: 126: kernel x86_64 3.10.0-1160.108.1.el7 WD_CentOS_7_Linux_x86_64_CentOS_7_Updates_x86_64
127: 52 M
128: Updating: 129: docker-buildx-plugin x86_64 0.12.1-1.el7 docker-ce-stable 13 M
...
186: Removing: 187: kernel x86_64 3.10.0-1160.92.1.el7 @WD_CentOS_7_Linux_x86_64_CentOS_7_Updates_x86_64
188: 66 M
189:
190: Transaction Summary
191: ================================================================================
192: Install 1 Package
193: Upgrade 31 Packages
194: Remove 1 Package
195:
196: Total download size: 158 M
197: Downloading packages: 198: Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
...
210: Total 14 MB/s | 158 MB 00:11
211: Running transaction check
212:
100 137 0 137 0 0 9 0 --:--:-- 0:00:15 --:--:-- 0Running transaction test
213:
100 137 0 137 0 0 8 0 --:--:-- 0:00:16 --:--:-- 0Transaction test succeeded
214: Running transaction
215: Updating : libsss_certmap-1.16.5-10.el7_9.16.x86_64 1/64
216: Updating : libsss_idmap-1.16.5-10.el7_9.16.x86_64 2/64
217: Updating : 1:net-snmp-libs-5.7.2-49.el7_9.4.x86_64 3/64
100 137 0 137 0 0 7 0 --:--:-- 0:00:17 --:--:-- 0
218: Updating : libipa_hbac-1.16.5-10.el7_9.16.x86_64 4/64
219: Updating : 1:net-snmp-agent-libs-5.7.2-49.el7_9.4.x86_64 5/64
220: Updating : libsss_autofs-1.16.5-10.el7_9.16.x86_64 6/64
221: Updating : kernel-tools-libs-3.10.0-1160.108.1.el7.x86_64 7/64
222: Updating : python-sssdconfig-1.16.5-10.el7_9.16.noarch 8/64
...
253: Installing : kernel-3.10.0-1160.108.1.el7.x86_64 32/64
...
259: Cleanup : sssd-1.16.5-10.el7_9.15.x86_64 33/64
260: Cleanup : sssd-ipa-1.16.5-10.el7_9.15.x86_64 34/64
...
269: curl: (18) transfer closed with outstanding read data remaining
270: lease successfully aquired by k8s-dev-master1
271: node k8s-dev-master1 cordoned
272: check if critical pods have exited
273: critical pods have exited
274:
...
297: Loaded plugins: fastestmirror, product-id, subscription-manager
298: Verifying : sssd-ldap-1.16.5-10.el7_9.16.x86_64 1/64
...
361: Verifying : libsss_nss_idmap-1.16.5-10.el7_9.15.x86_64 64/64
362:
363: Removed: 364: kernel.x86_64 0:3.10.0-1160.92.1.el7
365:
366: Installed: 367: kernel.x86_64 0:3.10.0-1160.108.1.el7
368:
369: Updated: 370: docker-buildx-plugin.x86_64 0:0.12.1-1.el7
371: docker-ce.x86_64 3:25.0.1-1.el7
372: docker-ce-cli.x86_64 1:25.0.1-1.el7
373: docker-ce-rootless-extras.x86_64 0:25.0.1-1.el7
374: docker-compose-plugin.x86_64 0:2.24.2-1.el7
375: kernel-tools.x86_64 0:3.10.0-1160.108.1.el7
376: kernel-tools-libs.x86_64 0:3.10.0-1160.108.1.el7
377: libipa_hbac.x86_64 0:1.16.5-10.el7_9.16
378: libsss_autofs.x86_64 0:1.16.5-10.el7_9.16
379: libsss_certmap.x86_64 0:1.16.5-10.el7_9.16
380: libsss_idmap.x86_64 0:1.16.5-10.el7_9.16
381: libsss_nss_idmap.x86_64 0:1.16.5-10.el7_9.16
382: libsss_sudo.x86_64 0:1.16.5-10.el7_9.16
383: net-snmp.x86_64 1:5.7.2-49.el7_9.4
384: net-snmp-agent-libs.x86_64 1:5.7.2-49.el7_9.4
385: net-snmp-libs.x86_64 1:5.7.2-49.el7_9.4
386: net-snmp-utils.x86_64 1:5.7.2-49.el7_9.4
387: python-libipa_hbac.x86_64 0:1.16.5-10.el7_9.16
388: python-perf.x86_64 0:3.10.0-1160.108.1.el7
389: python-sss-murmur.x86_64 0:1.16.5-10.el7_9.16
390: python-sssdconfig.noarch 0:1.16.5-10.el7_9.16
391: sssd.x86_64 0:1.16.5-10.el7_9.16
392: sssd-ad.x86_64 0:1.16.5-10.el7_9.16
393: sssd-client.x86_64 0:1.16.5-10.el7_9.16
394: sssd-common.x86_64 0:1.16.5-10.el7_9.16
395: sssd-common-pac.x86_64 0:1.16.5-10.el7_9.16
396: sssd-ipa.x86_64 0:1.16.5-10.el7_9.16
397: sssd-krb5.x86_64 0:1.16.5-10.el7_9.16
398: sssd-krb5-common.x86_64 0:1.16.5-10.el7_9.16
399: sssd-ldap.x86_64 0:1.16.5-10.el7_9.16
400: sssd-proxy.x86_64 0:1.16.5-10.el7_9.16
401:
402: Complete!
403: Core libraries or services have been updated: 404: kernel -> 3.10.0-1160.108.1.el7
405:
406: Reboot is required to ensure that your system benefits from these updates.
407:
408: More information: 409: https://access.redhat.com/solutions/27943
410: SUSS: releasedelayed
411: % Total % Received % Xferd Average Speed Time Time Time Current
412: Dload Upload Total Spent Left Speed
413:
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
414: Shutdown scheduled for Mon 2024-01-29 14:16:16 CET, use 'shutdown -c' to cancel.
415: stopping logstream
416: /var/tmp/foreman-ssh-cmd-8abf4999-8825-4dfe-9dca-b289987599d7/script: line 1: kill: (13730) - No such process
417: Exit status: 0
Log Analysis:
In Line 410 the script was obviously able to contact suss. So this log do not show the described behaviour. In last weeks execution the script was not able to complete successfully, and the node owning the lock remained in Cordoned state, so assuming that suss was not restarted (in time). Because of log retention periods, those logs are no longer available.
We will make some more tests if the suss container will successfully restart in this case.
We will test if suss will be restarted on a node that has not been updated yet:
[root@k8s-dev-master2 ~]# yum check-update --disablerepo='*' --enablerepo=docker-ce-stable
Failed to set locale, defaulting to C
Loaded plugins: fastestmirror, product-id, search-disabled-repos, subscription-manager
Loading mirror speeds from cached hostfile
docker-buildx-plugin.x86_64 0.12.1-1.el7 docker-ce-stable
docker-ce.x86_64 3:25.0.1-1.el7 docker-ce-stable
docker-ce-cli.x86_64 1:25.0.1-1.el7 docker-ce-stable
docker-ce-rootless-extras.x86_64 25.0.1-1.el7 docker-ce-stable
docker-compose-plugin.x86_64 2.24.2-1.el7 docker-ce-stable
On the k8s-dev-master2
node, suss in running in pod named suss-l7t9v
without a restart. Let's run the update:
yum update --disablerepo='*' --enablerepo=docker-ce-stable
After confirmation and command completion, suss has been restarted, which is also annotated by kubelet:
[root@k8s-dev-master2 ~]# docker inspect d73 |grep restart
"annotation.io.kubernetes.container.restartCount": "1",
We also see the restart in rancher, and are able to inspect the logs.
Tasks:
On testing by using the ./update-example..sh script we notices a bug / feature gap that causes the lock not to be released if the container running suss is terminated by the system update e.g. because it updates the configured CRI tools (docker in this case).
Why it happens:
/synchronize
and acquired the lock/teardown
is executed, but not relevant here/releasedelayed
but this fails because suss in not runningSome notes:
curl suss
causes script exit there may be not enough time. Kubelet may have restarted the pod after successful docker update, but failed before.Possible resolutions: