scylladb / scylla-operator

The Kubernetes Operator for ScyllaDB
https://operator.docs.scylladb.com/
Apache License 2.0
331 stars 163 forks source link

Make sure xfs mount that won't mount because it needs repair is manifested on NodeConfig #1558

Open tnozicka opened 10 months ago

tnozicka commented 10 months ago

What should the feature do?

When a NodeConfig utilizes mount which systemd unit fails, it needs to set a degraded condition.

What is the use case behind this feature?

xfs is notorious for breaking on unclean restarts and often needs manual action (running xfs_repair to recover). We should manifest that into tha API so it's observable. (And all the other mount failures)

● mnt-persistent\x2dvolumes.mount                                                                                                                        loaded failed failed    Managed mount by Scylla Operator
root@ubuntu-2204:~# systemctl status mnt-persistent\x2dvolumes.mount
Unit mnt-persistentx2dvolumes.mount could not be found.
root@ubuntu-2204:~# journal^C
root@ubuntu-2204:~# systemctl status 'mnt-persistent\x2dvolumes.mount'
× mnt-persistent\x2dvolumes.mount - Managed mount by Scylla Operator
     Loaded: loaded (/etc/systemd/system/mnt-persistent\x2dvolumes.mount; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2023-11-10 18:50:29 UTC; 29s ago
      Where: /mnt/persistent-volumes
       What: /dev/md127
        CPU: 5ms

Nov 10 18:50:29 ubuntu-2204 systemd[1]: Mounting Managed mount by Scylla Operator...
Nov 10 18:50:29 ubuntu-2204 systemd[1]: mnt-persistent\x2dvolumes.mount: Mount process exited, code=exited, status=32/n/a
Nov 10 18:50:30 ubuntu-2204 mount[3469]: mount: /mnt/persistent-volumes: mount(2) system call failed: Structure needs cleaning.
Nov 10 18:50:29 ubuntu-2204 systemd[1]: mnt-persistent\x2dvolumes.mount: Failed with result 'exit-code'.
Nov 10 18:50:29 ubuntu-2204 systemd[1]: Failed to mount Managed mount by Scylla Operator.

Anything else we need to know?

No response

scylla-operator-bot[bot] commented 2 months ago

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

/lifecycle stale

rzetelskik commented 2 months ago

/remove-lifecycle stale /triage accepted