rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

[v0.19.11-k3s1r0] "Cannot determine CPU /sys/bus/cpu/devices/cpuN online state, skipping" #743

Closed SteffenBlake closed 2 years ago

SteffenBlake commented 2 years ago

Version (k3OS / kernel) k3os --version: v0.19.11-k3s1r0 uname --kernel-release --kernel-version: 5.10.60-v8+ #1449 SMP PREEMPT Wed Aug 25 15:01:33 BST 2021

Architecture uname --machine: aarch64

CPU lscpu output:

Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Vendor ID:                       ARM
Model:                           3
Model name:                      Cortex-A72
Stepping:                        r0p3
CPU max MHz:                     1500.0000
CPU min MHz:                     600.0000
BogoMIPS:                        108.00
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm crc32 cpuid

Describe the bug Every few minutes, the following logs show up in /var/log/k3s-service.log

W0908 16:33:13.342689    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W0908 16:33:13.342936    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W0908 16:33:13.343073    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W0908 16:33:13.343192    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E0908 16:33:13.343245    2431 machine.go:72] Cannot read number of physical cores correctly, number of cores set to 0
W0908 16:33:13.343682    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W0908 16:33:13.343803    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W0908 16:33:13.343918    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W0908 16:33:13.344033    2431 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E0908 16:33:13.344083    2431 machine.go:86] Cannot read number of sockets correctly, number of sockets set to 0

To Reproduce Get the image via my fork of the following repo: https://github.com/SteffenBlake/picl-k3os-image-generator

You will need the mac address of your raspberry pi as well, as per the instructions, and a working yaml cloud config file.

sudo mkdir /temp
sudo cd /temp
sudo git checkout https://github.com/SteffenBlake/picl-k3os-image-generator
cd picl-k3os-image-generator
sudo export K3OS_VERSION=v0.19.11-k3s1r0
sudo nano config/<YourRaspberryPisMacAddress>.yaml

Copy your cloud config to said yaml file, then:

sudo ./build-image.sh raspberrypi

Then copy/burn/etc the image which will be now located at /temp/picl-k3os-image-generator/picl-k3os-v0.19.11-k3s1r0-raspberrypi.img

On boot you should see the logs show up shortly on the raspberry pi device.

Expected behavior Im not sure how much these logs inherently matter, but I also have non raspberry pi devices on my cluster which dont have this issue, and I have noticed all deployments/pods/statefulsets/etc, if given the option, will always choose the non raspberry pi machines over the raspberry pis, no matter how much is already on them.

This has resulted in even with several deployments my raspberry pis still only have daemonsets running on them, since it is forced.

Additional context Current config yamls of all my agent devices that are raspberry pis:

ssh_authorized_keys:
  - ssh-rsa AAAAB3NzaC1yc2*******************************== rsa-key-20210705

hostname: *****

run_cmd:
- "touch /etc/localtime"
- "cp /usr/share/zoneinfo/Canada/Mountain /etc/localtime"
- "touch /etc/timezone"
- "echo Canada/Mountain > /etc/timezone"

k3os:
  data_sources:
    - aws
    - cdrom
  modules:
    - kvm
    - nvme
  sysctl:
    kernel.printk: "4 4 1 7"
    kernel.kptr_restrict: "1"
  ntp_servers:
    - 0.us.pool.ntp.org
    - 1.us.pool.ntp.org
  k3s_args:
    - agent
  server_url: https://192.168.0.06:6443
  environment:
    K3S_CLUSTER_SECRET: *******
dweomer commented 2 years ago

Looks like this is an upstream issue: https://github.com/kubernetes/kubernetes/issues/95039

Perhaps fixed in 1.21.

@SteffenBlake can you try https://github.com/rancher/k3os/releases/tag/v0.21.1-k3s1r0 and see if the issue persists?

SteffenBlake commented 2 years ago

@dweomer issue seems to be resolved, though a new log message error shows up (seems unrelated), however Ill open a new issue for that one if you prefer.

dweomer commented 2 years ago

@dweomer issue seems to be resolved, though a new log message error shows up (seems unrelated), however Ill open a new issue for that one if you prefer.

please do