rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.57k stars 268 forks source link

After rke2 v1.27.4 arm64 installed, cni pod init error #4737

Closed liyang516 closed 1 year ago

liyang516 commented 1 year ago

Environmental Info: RKE2 Version:

# rke2 -v
rke2 version v1.25.9+rke2r1 (842d05e64bcbf78552f1db0b32700b8faea403a0)
go version go1.19.8 X:boringcrypto

OS info:

# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (AltArch)

# arch
aarch64

k8s Version:

# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4+rke2r1", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T20:19:45Z", GoVersion:"go1.20.6 X:boringcrypto", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4+rke2r1", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T20:19:45Z", GoVersion:"go1.20.6 X:boringcrypto", Compiler:"gc", Platform:"linux/arm64"}

Install result:

# kubectl get nodes
NAME                                STATUS     ROLES                       AGE   VERSION
centos78v.arm.bjat.qianxin-inc.cn   NotReady   control-plane,etcd,master   17h   v1.27.4+rke2r1

# kubectl get pods -A
NAMESPACE         NAME                                                         READY   STATUS             RESTARTS          AGE
kube-system       cloud-controller-manager-centos78v.arm.bjat.qianxin-inc.cn   1/1     Running            0                 17h
kube-system       etcd-centos78v.arm.bjat.qianxin-inc.cn                       1/1     Running            0                 17h
kube-system       helm-install-rke2-ingress-nginx-fqg66                        0/1     Pending            0                 17h
kube-system       helm-install-rke2-metrics-server-bgnv7                       0/1     Pending            0                 17h
kube-system       helm-install-rke2-snapshot-controller-crd-mzvf6              0/1     Pending            0                 17h
kube-system       helm-install-rke2-snapshot-controller-xtsv6                  0/1     Pending            0                 17h
kube-system       helm-install-rke2-snapshot-validation-webhook-9vczh          0/1     Pending            0                 17h
kube-system       kube-apiserver-centos78v.arm.bjat.qianxin-inc.cn             1/1     Running            0                 17h
kube-system       kube-controller-manager-centos78v.arm.bjat.qianxin-inc.cn    1/1     Running            0                 17h
kube-system       kube-proxy-centos78v.arm.bjat.qianxin-inc.cn                 0/1     CrashLoopBackOff   205 (2m36s ago)   17h
kube-system       kube-scheduler-centos78v.arm.bjat.qianxin-inc.cn             1/1     Running            0                 17h
kube-system       rke2-coredns-rke2-coredns-5f5d6b54c7-ptmcj                   0/1     Pending            0                 17h
kube-system       rke2-coredns-rke2-coredns-autoscaler-6bf8f59fd5-zq4q5        0/1     Pending            0                 17h
tigera-operator   tigera-operator-569cff7b5b-4h4kl                             0/1     CrashLoopBackOff   179 (98s ago)     17h

Pod log:

# kubectl logs -n kube-system kube-proxy-centos78v.arm.bjat.qianxin-inc.cn 
I0908 03:36:38.603113       1 server.go:226] "Warning, all flags other than --config, --write-config-to, and --cleanup are deprecated, please begin using a config file ASAP"
I0908 03:36:38.613296       1 node.go:141] Successfully retrieved node IP: 10.57.128.200
I0908 03:36:38.613336       1 server_others.go:110] "Detected node IP" address="10.57.128.200"
I0908 03:36:38.638758       1 iptables.go:221] "Error checking iptables version, assuming version at least" version="1.4.11" err="signal: segmentation fault"
I0908 03:36:38.639653       1 iptables.go:221] "Error checking iptables version, assuming version at least" version="1.4.11" err="signal: segmentation fault"
E0908 03:36:38.640432       1 server.go:494] "Error running ProxyServer" err="iptables is not supported for primary IP family \"IPv4\""
E0908 03:36:38.640459       1 run.go:74] "command failed" err="iptables is not supported for primary IP family \"IPv4\""

# kubectl exec -it -n kube-system kube-controller-manager-centos78v.arm.bjat.qianxin-inc.cn bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.

# iptables --version
Segmentation fault

# kubectl logs -n tigera-operator tigera-operator-569cff7b5b-4h4kl 
2023/09/08 03:37:34 [INFO] Version: v1.30.4
2023/09/08 03:37:34 [INFO] Go Version: go1.20.4
2023/09/08 03:37:34 [INFO] Go OS/Arch: linux/arm64
2023/09/08 03:38:04 [ERROR] Get "https://100.64.0.1:443/api?timeout=32s": dial tcp 100.64.0.1:443: i/o timeout

# kubectl logs -n kube-system rke2-canal-nwpl2 -c install-cni
......
2023-09-08 03:56:42.454 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.25.1

2023-09-08 03:56:42.454 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
W0908 03:56:42.454671       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-09-08 03:56:42.457 [ERROR][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.43.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/canal/token": dial tcp 10.43.0.1:443: connect: connection refused
2023-09-08 03:56:42.457 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.43.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/canal/token": dial tcp 10.43.0.1:443: connect: connection refused

Describe the problem:

  1. cni pod can not connect k8s api-service(both canal and calico),got the same error
  2. kube-proxy prompts that the iptables segmentation fault
  3. k8s node not ready

Steps To Reproduce:

systemctl stop firewalld.service
systemctl stop NetworkManager
INSTALL_RKE2_ARTIFACT_PATH=/root/rke2-artifacts sh install.sh
systemctl enable rke2-server.service
systemctl start rke2-server.service

Expected behavior: expect the server node to be ready

Other attempts: I use the same install step on ubuntu-arm64-22.04, it works

brandond commented 1 year ago

The install failed, but you can run rke2 -v - so it's obviously installed. Where's the failure? I don't see even so much as an error message here.

liyang516 commented 1 year ago

The install failed, but you can run rke2 -v - so it's obviously installed. Where's the failure? I don't see even so much as an error message here.

Do you know what the problem is? @brandond

brandond commented 1 year ago

I'm not sure why iptables would segfault on your hardware; I suspect perhaps your processor model lacks something the binary expects. What is the output of cat /proc/cpuinfo ?

liyang516 commented 1 year ago

I'm not sure why iptables would segfault on your hardware; I suspect perhaps your processor model lacks something the binary expects. What is the output of cat /proc/cpuinfo

I use the arm64 virtualmachine,here is the cpu info:

# lscpu
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          2
Model:                 0
BogoMIPS:              200.00
NUMA node0 CPU(s):     0,1
NUMA node1 CPU(s):     2,3
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

# cat /proc/cpuinfo 
processor   : 0
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0xd01
CPU revision    : 0

processor   : 1
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0xd01
CPU revision    : 0

processor   : 2
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0xd01
CPU revision    : 0

processor   : 3
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0xd01
CPU revision    : 0
brandond commented 1 year ago

Which VM platform are you running it in? Can you provide steps to reproduce? This works for me on multiple physical arm64 platforms.

liyang516 commented 1 year ago

Which VM platform are you running it in? Can you provide steps to reproduce? This works for me on multiple physical arm64 platforms.

My vm running on OpenStack, my physical compute node system is CentOS7.8, cpu use HUAWEI Kunpeng 920 5220,

# Physical node info
# arch
aarch64

# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (AltArch)

# lscpu
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    1
Core(s) per socket:    32
Socket(s):             2
NUMA node(s):          2
Model:                 0
CPU max MHz:           2600.0000
CPU min MHz:           200.0000
BogoMIPS:              200.00
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              32768K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

# cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd01
CPU revision    : 0
...

# dmidecode -t processor
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x001B, DMI type 4, 48 bytes
Processor Information
    Socket Designation: CPU01
    Type: Central Processor
    Family: ARM
    Manufacturer: HiSilicon
    ID: 10 D0 1F 48 00 00 00 00
    Signature: Implementor 0x48, Variant 0x1, Architecture 15, Part 0xd01, Revision 0
    Version: HUAWEI Kunpeng 920 5220
    Voltage: 0.9 V
    External Clock: 100 MHz
    Max Speed: 2600 MHz
    Current Speed: 2600 MHz
    Status: Populated, Enabled
    Upgrade: Unknown
    L1 Cache Handle: 0x0018
    L2 Cache Handle: 0x0019
    L3 Cache Handle: 0x001A
    Serial Number: 6B73215401A03324
    Asset Tag: To be filled by O.E.M.
    Part Number: To be filled by O.E.M.
    Core Count: 32
    Core Enabled: 32
    Thread Count: 32
    Characteristics:
        64-bit capable
        Multi-Core
        Execute Protection
        Enhanced Virtualization
        Power/Performance Control
liyang516 commented 1 year ago

OpenStack use Train release, nova libvirt related configuration

# cat /etc/nova/nova.conf
libvirt]
connection_uri = qemu:///system
cpu_mode = host-passthrough
virt_type = kvm

The following is the xml of the virtual machine

# virsh list
 Id   Name                State    
-----------------------------------
 30   ubuntu-arm          running  
 38   instance-00001ce6   running  
 43   instance-00001ce5   running  
 52   instance-00001d33   running  
 54   instance-00001d4a   running  

# virsh dumpxml 54
<domain type='kvm' id='54'>
  <name>instance-00001d4a</name>
  <uuid>5db3a62c-bd34-4dce-b642-290e3df6db1f</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="0.0.0-1.el7"/>
      <nova:name>centos78v.arm.bjat.qianxin-inc.cn</nova:name>
      <nova:creationTime>2023-09-07 06:38:29</nova:creationTime>
      <nova:flavor name="kc1.large.2">
        <nova:memory>8192</nova:memory>
        <nova:disk>0</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>4</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="be803e337dbb423097ab049b5af4df95">admin</nova:user>
        <nova:project uuid="e93293733175465bbc00ccdf40a6f7b0">polaris-dev</nova:project>
      </nova:owner>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='30'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='41'/>
    <vcpupin vcpu='3' cpuset='43'/>
    <emulatorpin cpuset='5,30,41,43'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0-1'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>RDO</entry>
      <entry name='product'>OpenStack Compute</entry>
      <entry name='version'>0.0.0-1.el7</entry>
      <entry name='serial'>5db3a62c-bd34-4dce-b642-290e3df6db1f</entry>
      <entry name='uuid'>5db3a62c-bd34-4dce-b642-290e3df6db1f</entry>
      <entry name='family'>Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='aarch64' machine='virt-rhel7.6.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/instance-00001d4a_VARS.fd</nvram>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <gic version='3'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='2' cores='2' threads='1'/>
    <numa>
      <cell id='0' cpus='0-1' memory='4194304' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='4194304' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='fa197221-4a80-4976-a7c1-156b5fb7076e'/>
      </auth>
      <source protocol='rbd' name='cinder.volumes.hdd/volume-0806b304-dd14-44fc-a333-fec13a2e0826'>
        <host name='10.57.37.52' port='6789'/>
        <host name='10.57.37.53' port='6789'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <iotune>
        <total_bytes_sec>60000000</total_bytes_sec>
        <total_iops_sec>500</total_iops_sec>
      </iotune>
      <serial>0806b304-dd14-44fc-a333-fec13a2e0826</serial>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <interface type='ethernet'>
      <mac address='fa:16:3c:24:e3:7b'/>
      <target dev='tap047da446-bd'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/6'/>
      <log file='/var/lib/nova/instances/5db3a62c-bd34-4dce-b642-290e3df6db1f/console.log' append='off'/>
      <target type='system-serial' port='0'>
        <model name='pl011'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/6'>
      <source path='/dev/pts/6'/>
      <log file='/var/lib/nova/instances/5db3a62c-bd34-4dce-b642-290e3df6db1f/console.log' append='off'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='keyboard' bus='usb'>
      <alias name='input1'/>
      <address type='usb' bus='0' port='2'/>
    </input>
    <graphics type='vnc' port='5904' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='virtio' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='10'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+0</label>
    <imagelabel>+0:+0</imagelabel>
  </seclabel>
</domain>
zhenlohuang commented 1 year ago

I faced the same issue in version v1.28.1+rke2r1 on arm64, but is works fine in x86 machine.

liyang516 commented 1 year ago

I faced the same issue in version v1.28.1+rke2r1 on arm64, but is works fine in x86 machine.

Which operating system is used?

rbrtbnfgl commented 1 year ago

The issue seems related to the iptables installed on the machine. Could you check if the iptables binary is build for arm64?

zhenlohuang commented 1 year ago

I faced the same issue in version v1.28.1+rke2r1 on arm64, but is works fine in x86 machine.

Which operating system is used? NAME="CentOS Linux" VERSION="7 (AltArch)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (AltArch)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

The version of iptables: iptables v1.4.21

zhenlohuang commented 1 year ago

I also reproduced this issue without kubernetes

[root@my-vm ~]# docker run -it --entrypoint=bash rancher/hardened-kubernetes:v1.28.1-rke2r1-build20230825
c5cfe3404ad8:/ # iptables --version
/usr/sbin/iptables: line 65: awk: command not found
Segmentation fault
c5cfe3404ad8:/ # iptables --version
Segmentation fault
brandond commented 1 year ago

@rbrtbnfgl our hardened-kubernetes image actually includes iptables binaries from k3s-root: https://github.com/rancher/image-build-kubernetes/blob/master/Dockerfile#L58-L61

I suspect we need to bump this to v0.12.2 or newer for the 64k page size fix.

rancher-max commented 1 year ago

We closed this in k3s after community validation, and the fix is the same here, so I am going to close it out here with the same reasoning per https://github.com/k3s-io/k3s/issues/7335#issuecomment-1529916982. If this does not resolve the issue, please let me know and we can work towards a better fix and getting an environment where we can reproduce it. Thank you!

haoxiaoci commented 1 year ago

@rancher-max rke2 version v1.28.2+rke2r1 still got "iptables Segmentation fault" error, but I verified this tar package k3s-tools-arm , it works fine on CentOS 7, It seems that the ARCH parameter in line 57 of the Dockerfile should be changed to a variable.

brandond commented 1 year ago

It seems that the ARCH parameter in line 57 of the Dockerfile should be changed to a variable.

It's a Docker ARG, which is a variable passed in to the Dockerfile at build time... that is how Docker args work. https://github.com/rancher/image-build-kubernetes/blob/master/Makefile#L36

Can you confirm which iptables binary specifically is segfaulting? I suspect there may be another binary embedded somewhere that is not usable on your platform.

haoxiaoci commented 1 year ago

@brandond Here is a detailed binary comparison

docker run -it -d -v /root/k3s-root-v0.12.1:/root/k3s-root-v0.12.1  -v /root/k3s-root-v0.12.2:/root/k3s-root-v0.12.2 -v /root/k3s-root-v0.13.0:/root/k3s-root-v0.13.0 --entrypoint=bash rancher/hardened-kubernetes:v1.28.2-rke2r1-build20230913
[root@test ~]# docker exec -it aff16dc54a22 /bin/bash
aff16dc54a22:~ # ls -al /usr/sbin/iptables
lrwxrwxrwx 1 root root 17 Oct 16 04:06 /usr/sbin/iptables -> xtables-nft-multi
aff16dc54a22:~ # md5sum /usr/sbin/xtables-nft-multi 
ea4d47cd148cd0d0bab7586f28636cb4  /usr/sbin/xtables-nft-multi
aff16dc54a22:~ # md5sum /root/k3s-root-v0.12.1/bin/aux/xtables-nft-multi
ea4d47cd148cd0d0bab7586f28636cb4  /root/k3s-root-v0.12.1/bin/aux/xtables-nft-multi
aff16dc54a22:~ # md5sum /root/k3s-root-v0.12.2/bin/aux/xtables-nft-multi
fa36e7fb616aa85b7298481493e58a66  /root/k3s-root-v0.12.2/bin/aux/xtables-nft-multi
aff16dc54a22:~ # md5sum /root/k3s-root-v0.13.0/bin/aux/xtables-nft-multi
966a67cb630421221887c448256b57ad  /root/k3s-root-v0.13.0/bin/aux/xtables-nft-multi
aff16dc54a22:~ # /usr/sbin/xtables-nft-multi iptables --version
Segmentation fault
aff16dc54a22:~ #  /root/k3s-root-v0.12.1/bin/aux/xtables-nft-multi iptables --version
Segmentation fault
aff16dc54a22:~ # /root/k3s-root-v0.12.2/bin/aux/xtables-nft-multi iptables --version
bash: /root/k3s-root-v0.12.2/bin/aux/xtables-nft-multi: cannot execute binary file: Exec format error
aff16dc54a22:~ # 
aff16dc54a22:~ # /root/k3s-root-v0.13.0/bin/aux/xtables-nft-multi iptables --version
iptables v1.8.8 (nf_tables)

it turns out k3s-root-arm (version: v0.12.1)'s binary xtables-nft-multi md5sum is the same as rke2 (version:v1.28.2+rke2r1) image, and they all got Segmentation fault since k3s-root-arm (version: v0.13.0) works on my arm platform well, it means that these binarys can work well on my platform. and then for rke2 (version: v1.28.2+rke2r1) , it seems like maybe the k3s-root-arm version is mismatched. Can you help check why rke2 version v1.28.2+rke2r1 is not using package k3s-root-arm version v0.13.0 ?or just give rke2 a matching version of k3s-root-arm.

brandond commented 1 year ago

Hmm. https://github.com/rancher/image-build-kubernetes/releases/tag/v1.28.2-rke2r1-build20230913 shows that it was built against https://github.com/rancher/image-build-kubernetes/commit/c29ac4f85b77e19ace2fb4771c3a091b3bb14afa which has the updated version... I'll have to see if that is perhaps also set elsewhere.

brandond commented 1 year ago

oh, derp - we also define it here... and this one takes precedence https://github.com/rancher/image-build-kubernetes/blob/master/Makefile#L18

brandond commented 1 year ago

Will need to be tested once we have hardened-kubernetes images tagged for 1.28.3

est-suse commented 1 year ago

Validated on master branch with RC

v1.28.3-rc2+rke2r1
NAME="CentOS Linux"
VERSION="7 (AltArch)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (AltArch)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7:server"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Config.yaml:

token: secret
write-kubeconfig-mode: "0644"
profile: "cis"

Cluster Configuration:

1 server

Testing Steps

Copy config.yaml

$ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
  1. Install RKE2 $ curl -sfL https://get.rke2.io | sudoINSTALL_RKE2_VERSION=v1.28.3-rc2+rke2r1 INSTALL_RKE2_TYPE='server' INSTALL_RKE2_METHOD=tar sh -
kubectl get pods -A
NAMESPACE     NAME                                                                   READY   STATUS      RESTARTS   AGE
kube-system   cloud-controller-manager-ip-172-31-41-105.us-east-2.compute.internal   1/1     Running     0          14m
kube-system   etcd-ip-172-31-41-105.us-east-2.compute.internal                       1/1     Running     0          14m
kube-system   helm-install-rke2-canal-66hpq                                          0/1     Completed   0          13m
kube-system   helm-install-rke2-coredns-v7xlf                                        0/1     Completed   0          13m
kube-system   helm-install-rke2-ingress-nginx-26k6g                                  0/1     Completed   0          13m
kube-system   helm-install-rke2-metrics-server-n7vmb                                 0/1     Completed   0          13m
kube-system   helm-install-rke2-snapshot-controller-crd-vxfmw                        0/1     Completed   0          13m
kube-system   helm-install-rke2-snapshot-controller-np944                            0/1     Completed   1          13m
kube-system   helm-install-rke2-snapshot-validation-webhook-vdjk5                    0/1     Completed   0          13m
kube-system   kube-apiserver-ip-172-31-41-105.us-east-2.compute.internal             1/1     Running     0          14m
kube-system   kube-controller-manager-ip-172-31-41-105.us-east-2.compute.internal    1/1     Running     0          14m
kube-system   kube-proxy-ip-172-31-41-105.us-east-2.compute.internal                 1/1     Running     0          14m
kube-system   kube-scheduler-ip-172-31-41-105.us-east-2.compute.internal             1/1     Running     0          14m
kube-system   rke2-canal-nfnl4                                                       2/2     Running     0          13m
kube-system   rke2-coredns-rke2-coredns-6b795db654-f7k9q                             1/1     Running     0          13m
kube-system   rke2-coredns-rke2-coredns-autoscaler-945fbd459-pdbj5                   1/1     Running     0          13m
kube-system   rke2-ingress-nginx-controller-t7w2x                                    1/1     Running     0          12m
kube-system   rke2-metrics-server-544c8c66fc-fqcjm                                   1/1     Running     0          12m
kube-system   rke2-snapshot-controller-59cc9cd8f4-mtvfr                              1/1     Running     0          12m
kube-system   rke2-snapshot-validation-webhook-54c5989b65-qsm4r                      1/1     Running     0          12m

kubectl get nodes
NAME                                          STATUS   ROLES                       AGE   VERSION
ip-172-31-41-105.us-east-2.compute.internal   Ready    control-plane,etcd,master   14m   v1.28.3+rke2r1