siderolabs / extensions

Talos Linux System Extensions
Mozilla Public License 2.0
118 stars 119 forks source link

Extension fails to load after node restart #137

Open MysticalMount opened 1 year ago

MysticalMount commented 1 year ago

Im using the iSCSI extension in my machine install section:

` install: disk: /dev/sda image: ghcr.io/siderolabs/installer:v1.3.5 bootloader: true wipe: true extensions:

This might actually be specific issue to the iscsi-tools extension and not Talos, but reporting nonetheless.

For a new node this works great and as expected, but as soon as I restart the node I get this error:

spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system

Was expecting the node restart to behave the same, suspect its something thats being done on initial install with the iScsi tools install that isnt recreated on a restart post initial installation. Very frustrating having to re-roll nodes, if they need a reboot.

Environment

smira commented 1 year ago

This seems to be an issue with iscsi extension, moving to the extensions repo

frezbo commented 1 year ago

I cannot reproduce this issue, are you using some extra mount under /var?

Also this error is not from iscsi-tools

spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system

must be some workload that's on the cluster

MysticalMount commented 1 year ago

Hi @frezbo again! Correct, using the democratic-csi plugin - apologies if its related to that - the specific container that seems to fail is driver-registrar

  Warning  Unhealthy  9m2s (x1438 over 24h)  kubelet  (combined from similar events): Liveness probe failed: F0327 20:29:28.221817  190876 main.go:159] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
  /workspace/vendor/k8s.io/klog/v2/klog.go:1038 +0x8a
k8s.io/klog/v2.(*loggingT).output(0xf86600, 0x3, 0x0, 0xc00028ec40, 0x0, {0xc47a41, 0x1}, 0xc00032ad90, 0x0)
  /workspace/vendor/k8s.io/klog/v2/klog.go:987 +0x5fd
k8s.io/klog/v2.(*loggingT).printf(0xa63799, 0x4, 0x0, {0x0, 0x0}, {0xa8ac8d, 0x48}, {0xc00032ad90, 0x1, 0x1})
  /workspace/vendor/k8s.io/klog/v2/klog.go:753 +0x1c5
k8s.io/klog/v2.Fatalf(...)
  /workspace/vendor/k8s.io/klog/v2/klog.go:1532
main.main()
  /workspace/cmd/csi-node-driver-registrar/main.go:159 +0x48e

This for me happens consistently on a restartred node (normally when moving a VM around which isnt that frequent and could be avoided but still important that it should work as previous):

csi                             iscsi-controller-cd574668d-j52rf                    6/6     Running                0               24h
csi                             iscsi-node-59h7v                                    4/4     Running                0               24h
csi                             iscsi-node-87kp7                                    1/4     CreateContainerError   733 (11s ago)   30h

I only have two nodes currently, the one thats running is identical in configuration (to my knowledge) and is working, the bottom one has been rebooted. If I reboot the other one my prediction from what Ive seen thus far is it would also suffer the same problem (reluctant to do that as its running some stuff like HA/Omada etc) - Ive spun up fresh nodes using talos wipe --> send new configuration file and its working everytime, restart, this issue)

The node configuration file is exactly the same, only the IP/host and therefore nodename is changing.

Apologies if its my lack of understanding in the area/something silly Im doing, youre help is appreciated in all cases!

If it should be with democratic-csi then thats fine (and sorry for the time) - but the extension on the node led me to believe it might be Talos extension.

frezbo commented 1 year ago

this seems to be likely an issue with the csi, and from the looks of the error maybe it wasn;t patched to run the commands in the iscsi-tools extension pid namespace and trying to create files on the host which is read-only

MysticalMount commented 1 year ago

Ive updated to Talos v1.4.6, Kubernetes v1.27.1, this is still occurring, Im not sure where the issue is, but some extra information, I had a power cut, which forced all nodes to restart.

What is new, is that, even though all three had very similar configuration (i would say identical for all intensive purposes), one of them has worked post a restart:

aya369@ayadev:~/repos/AyaKube/ayats$ talosctl -n node2 --talosconfig ./talosconfig list /usr/local/etc
NODE   NAME
1 error occurred:
 rpc error: code = Unknown desc = lstat /usr/local/etc: no such file or directory
aya369@ayadev:~/repos/AyaKube/ayats$ talosctl -n node1 --talosconfig ./talosconfig list /usr/local/etc
NODE   NAME
1 error occurred:
 rpc error: code = Unknown desc = lstat /usr/local/etc: no such file or directory
aya369@ayadev:~/repos/AyaKube/ayats$ talosctl -n node3 --talosconfig ./talosconfig list /usr/local/etc
NODE          NAME
node3   .
node3   containers
node3   iscsi
node3   passwd
node3   udev
aya369@ayadev:~/repos/AyaKube/ayats$ 

Therefore it is able to create such files in the usual cases which still makes me think its the extension or Talos related.

Its still giving the same error on the failed to set up iScsi nodes:

Failed node 1 describe pod:

  Normal   Scheduled  14m                  default-scheduler  Successfully assigned csi/iscsi-node-mkzgc to ayakm2
  Normal   Created    14m                  kubelet            Created container driver-registrar
  Normal   Pulled     14m                  kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 106.097117ms (106.153847ms including waiting)
  Warning  Failed     14m                  kubelet            Error: failed to generate container "05c5e54606248dbd9c546f0b8a844723ebb85a87c5210cb1f7d83401a3296596" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Normal   Pulled     14m                  kubelet            Container image "docker.io/democraticcsi/csi-grpc-proxy:v0.5.3" already present on machine
  Normal   Created    14m                  kubelet            Created container csi-proxy
  Normal   Started    14m                  kubelet            Started container csi-proxy
  Normal   Pulled     14m                  kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0" already present on machine
  Normal   Pulled     14m                  kubelet            Container image "docker.io/busybox:1.32.0" already present on machine
  Warning  Failed     14m                  kubelet            Error: failed to generate container "8bcdbff3bef6c1b83c4e5cfcb56036713badad149ff46c7a380ef8b46a5a3a6b" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Normal   Started    14m                  kubelet            Started container driver-registrar
  Normal   Created    14m                  kubelet            Created container cleanup
  Normal   Started    14m                  kubelet            Started container cleanup
  Normal   Pulled     14m                  kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 123.065126ms (123.13965ms including waiting)
  Normal   Pulled     14m                  kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 113.240991ms (113.31304ms including waiting)
  Warning  Failed     14m                  kubelet            Error: failed to generate container "261f06e17f51cfb86d7eb6f07b0a0718f7bc5a911225405ac9c99f21ad5c05d7" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Warning  Unhealthy  14m                  kubelet            Liveness probe failed: F0729 20:26:23.320759    6265 main.go:160] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
  Normal   Pulled     13m                  kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 101.052426ms (101.102542ms including waiting)
  Warning  Failed     13m                  kubelet            Error: failed to generate container "d0a498e360d35af931ac37a4c97edef2a95216a9812b96286cc298c695f91d34" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Warning  Unhealthy  13m                  kubelet            Liveness probe failed: F0729 20:26:33.313889    6334 main.go:160] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
  Warning  Unhealthy  13m                  kubelet            Liveness probe failed: F0729 20:26:43.313944    6430 main.go:160] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
  Normal   Killing    13m                  kubelet            Container driver-registrar failed liveness probe, will be restarted
  Normal   Pulling    4m7s (x70 over 14m)  kubelet            Pulling image "docker.io/democraticcsi/democratic-csi:latest"

Failed node 2 describe pod [post-restart]:

  Normal   Scheduled  16m                 default-scheduler  Successfully assigned csi/iscsi-node-4x8fc to ayakm1
  Normal   Pulled     16m                 kubelet            Container image "docker.io/busybox:1.32.0" already present on machine
  Warning  Failed     16m                 kubelet            Error: failed to generate container "68ca165e4d442bd0566b3a2e3aec49469d54ace3aa52b0f70ab98bccc74bda07" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Normal   Started    16m                 kubelet            Started container cleanup
  Normal   Pulled     16m                 kubelet            Container image "docker.io/democraticcsi/csi-grpc-proxy:v0.5.3" already present on machine
  Normal   Created    16m                 kubelet            Created container csi-proxy
  Normal   Started    16m                 kubelet            Started container csi-proxy
  Normal   Pulled     16m                 kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0" already present on machine
  Normal   Created    16m                 kubelet            Created container driver-registrar
  Normal   Pulled     16m                 kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 73.760519ms (73.773808ms including waiting)
  Normal   Created    16m                 kubelet            Created container cleanup
  Warning  Failed     16m                 kubelet            Error: failed to generate container "d374bdfd74c5645c0714c13536b6f7d7728dd04700a830ead5e350e79fd3ac1a" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Normal   Pulled     16m                 kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 59.513269ms (59.523948ms including waiting)
  Normal   Started    16m                 kubelet            Started container driver-registrar
  Warning  Unhealthy  16m                 kubelet            Liveness probe failed: F0729 20:25:18.202403    9530 main.go:160] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
  Normal   Pulled     16m                 kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 61.776155ms (61.787374ms including waiting)
  Warning  Failed     16m                 kubelet            Error: failed to generate container "53072ecd8cfaeb920813cb5251f2084207197e56a9661e2ad371965018b2fd57" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Warning  Unhealthy  16m                 kubelet            Liveness probe failed: F0729 20:25:28.218422    9598 main.go:160] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
  Normal   Pulled     16m                 kubelet            Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 59.099917ms (59.114417ms including waiting)
  Warning  Failed     16m                 kubelet            Error: failed to generate container "c3ba76889ed01f8321fdf8bd13d24bd32f76a4074455ab2db4b9988da96a73ad" spec: failed to generate spec: failed to mkdir "/usr/local/etc/iscsi": mkdir /usr/local/etc: read-only file system
  Normal   Killing    15m                 kubelet            Container driver-registrar failed liveness probe, will be restarted
  Warning  Unhealthy  15m                 kubelet            Liveness probe failed: F0729 20:25:38.190982    9668 main.go:160] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/org.democratic-csi.iscsi-synology/registration doesn't exist.
  Normal   Pulling    87s (x98 over 16m)  kubelet            Pulling image "docker.io/democraticcsi/democratic-csi:latest"

Working node [post-restart] -

  Normal   SandboxChanged  23m                kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Created         23m                kubelet  Created container csi-proxy
  Normal   Pulled          23m                kubelet  Successfully pulled image "docker.io/democraticcsi/democratic-csi:latest" in 215.900235ms (215.972748ms including waiting)
  Normal   Created         23m                kubelet  Created container csi-driver
  Normal   Started         23m                kubelet  Started container csi-driver
  Normal   Pulled          23m                kubelet  Container image "docker.io/democraticcsi/csi-grpc-proxy:v0.5.3" already present on machine
  Normal   Pulling         23m                kubelet  Pulling image "docker.io/democraticcsi/democratic-csi:latest"
  Normal   Pulled          23m                kubelet  Container image "docker.io/busybox:1.32.0" already present on machine
  Normal   Started         23m                kubelet  Started container csi-proxy
  Normal   Created         23m                kubelet  Created container cleanup
  Normal   Started         23m                kubelet  Started container cleanup
  Warning  BackOff         22m (x3 over 22m)  kubelet  Back-off restarting failed container driver-registrar in pod iscsi-node-bffrr_csi(75b485db-b948-48c1-8ca0-65c6d07afdaa)
  Normal   Pulled          22m (x2 over 23m)  kubelet  Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0" already present on machine
  Normal   Created         22m (x2 over 23m)  kubelet  Created container driver-registrar
  Normal   Started         22m (x2 over 23m)  kubelet  Started container driver-registrar

At the moment the only fix Im aware off is to re-roll the node with the same config, very frustrating. Considering switching back to NFS or any other recommended storage solution (I am running Ceph). Im not sure if this would be a problem with any of the CSI drivers though or just iSCSI

Any help appreciated