siderolabs / extensions

Talos Linux System Extensions
Mozilla Public License 2.0
104 stars 105 forks source link

DRBD extension not working on arm / rock64 #251

Open Ulrar opened 10 months ago

Ulrar commented 10 months ago

Hi,

Apologies I don't really know how to debug this, but on my rock64 when using the DRBD extension I seem to be missing the drbd module :

talosctl --talosconfig talosconfig -n mynode list /lib/modules/6.1.58-talos/extras
NODE   NAME
1 error occurred:
 rpc error: code = Unknown desc = lstat /lib/modules/6.1.58-talos/extras: no such file or directory

The same config deployed on x86 machines does have that directory populated with the .ko files as expected. I tried using the tag and also the specific arm hash from here, to be sure but no luck when "upgrading" to the same version to rebuild the initramfs.

I can't access the display for that node, not sure which service log might explain why this is failing ? Thanks

smira commented 10 months ago

There isn't enough information in the ticket. Is the drbd extension installed? Does it match Talos version?

Ulrar commented 10 months ago

There isn't enough information in the ticket. Is the drbd extension installed? Does it match Talos version?

Since the directory isn't present on the host I assume it's not, but I don't know how else to check. Linstor definitely isn't finding the DRBD module in any case, so it's not just a path issue.

It is the same (latest) version yes :

image: ghcr.io/siderolabs/drbd:9.2.4-v1.5.4

As stated the exact same config on two other x86 nodes does work fine, the issue is only on the rock64 which is arm64.

smira commented 10 months ago

you have talosctl get extensions to see what extensions are installed

smira commented 10 months ago

you can check yourself that the extension does contain the files, so the problem is somewhere probably on your end:

$ crane export ghcr.io/siderolabs/drbd:9.2.4-v1.5.4@sha256:908a2e1129ae6434c5af887b9f3ba7fde039b635e471cef2be808e017d464275 - | tar tv
-rw-r--r-- 0/0             272 2022-01-20 22:35 manifest.yaml
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib/modules
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras
-rw-r--r-- 0/0         1141122 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras/drbd.ko
-rw-r--r-- 0/0           88162 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras/drbd_transport_rdma.ko
-rw-r--r-- 0/0           49410 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras/drbd_transport_tcp.ko
-rw-r--r-- 0/0              74 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.alias
-rw-r--r-- 0/0              48 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.alias.bin
-rw-r--r-- 0/0           58621 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin
-rw-r--r-- 0/0           42432 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin.alias.bin
-rw-r--r-- 0/0           64021 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin.bin
-rw-r--r-- 0/0          362817 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin.modinfo
-rw-r--r-- 0/0             107 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.dep
-rw-r--r-- 0/0             191 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.dep.bin
-rw-r--r-- 0/0               0 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.devname
-rw-r--r-- 0/0            2058 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.order
-rw-r--r-- 0/0              55 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.softdep
-rw-r--r-- 0/0             611 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.symbols
-rw-r--r-- 0/0             752 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.symbols.bin
Ulrar commented 10 months ago

Alright, after a lot of digging I think I figured it out. The issue is the rock64 doesn't really have enough memory to schedule much, and certainly not the piraeus-operator. Even without that the upgrade command just silently kills the node unless I use --stage, I'm guessing because there's not enough memory to run the installer + the whole stack at the same time.

Using --stage I did manage to get drbd installed correctly, but that doesn't leave enough ram to schedule the piraeus-operator (it brings the node up to 107% usage).

Nevermind, I'll get rid of that node, thanks for your help