rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
11.98k stars 2.64k forks source link

Failed to enable disk encryption in the storage on EKS anywhere bare metal nodes #14133

Closed ygao-armada closed 3 weeks ago

ygao-armada commented 3 weeks ago

Is this a bug report or feature request? Bug report

  1. git clone --single-branch --branch v1.13.6 https://github.com/rook/rook.git
  2. cd rook/deploy/examples
  3. uncomment line: " # encryptedDevice: "true" ..."
  4. kubectl create -f crds.yaml -f common.yaml -f operator.yaml
  5. kubectl create -f cluster.yaml

I failed with 2 tries:

  1. create partition for each data disk.
  2. no partition for each data disk

The former has RUNNING rook-ceph-osd-prepare-xxx pods, and such rook-ceph-osd-prepare logs:

2024-04-26 09:15:00.537540 D | exec: Running command: lsblk --noheadings --path --list --output NAME /dev/sda
2024-04-26 09:15:00.538671 I | inventory: skipping device "sda" because it has child, considering the child instead.
...
2024-04-26 09:15:00.602857 D | exec: Running command: ceph-volume inventory --format json /dev/sda1
2024-04-26 09:15:00.866116 I | cephosd: device "sda1" is available.
2024-04-26 09:15:00.866129 I | cephosd: partition "sda1" is not picked because encrypted OSD on partition is not allowed

The latter has CrashLoopBackOff rook-ceph-osd-prepare-xxx pods, and such rook-ceph-osd-prepare logs:

2024-04-26 15:41:31.128026 I | cephosd: device "sda" is available.
2024-04-26 15:41:31.128043 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
...
2024-04-26 15:41:31.393242 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd"
2024-04-26 15:41:31.393254 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json
2024-04-26 15:41:31.776134 D | cephosd: won't use raw mode since encryption is enabled
2024-04-26 15:41:31.776150 D | exec: Running command: nsenter --mount=/rootfs/proc/1/ns/mnt -- /usr/sbin/lvm --help
2024-04-26 15:41:31.776892 D | cephosd: failed to call nsenter. failed to execute nsenter. output: nsenter: failed to execute /usr/sbin/lvm: No such file or directory: exit status 127

Then I try to copy lvm to /usr/sbin/lvm, I get this logs:

Traceback (most recent call last):
 File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
  return f(*a, **kw)
 File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
  terminal.dispatch(self.mapper, subcommand_args)
 File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
  instance.main()
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
  terminal.dispatch(self.mapper, self.argv)
 File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
  instance.main()
 File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
  return func(*a, **kw)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 414, in main
  self._execute(plan)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 429, in _execute
  p.safe_prepare(argparse.Namespace(**args))
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 200, in safe_prepare
  rollback_osd(self.args, self.osd_id)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
  Zap(['--destroy', '--osd-id', osd_id]).main()
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 403, in main
  self.zap_osd()
 File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
  return func(*a, **kw)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd
  devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
 File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
  '%s' % osd_id or osd_fsid)
RuntimeError: Unable to find any LV for zapping OSD: 0
2024-04-26 16:23:19.076114 C | rookcmd: failed to configure devices: failed to initialize osd: failed ceph-volume: exit status 1

Deviation from expected behavior: no osd created Expected behavior: osd created properly

How to reproduce it (minimal and precise):

Just run above commands with EKS anywhere bare metal cluster with ubuntu 20.04 (to be honest, I'm afraid it's a general issue)

File(s) to submit:

Logs to submit: mentioned above

Cluster Status to submit:

Environment:

ygao-armada commented 3 weeks ago

After I install lvm2 in the osImage, the osd pods are created successfully:

$ kubectl -n rook-ceph get pod
...
rook-ceph-osd-0-556d6d75f9-l6pbz                                 2/2     Running     0          9m14s
rook-ceph-osd-1-59c4c76ccc-6wwpv                                 2/2     Running     0          8m45s
rook-ceph-osd-2-54dddf59bf-r69m8                                 2/2     Running     0          7m58s
rook-ceph-osd-3-696d9dd87b-5wh4w                                 2/2     Running     0          7m58s

In the node, we can see:

# lsblk -f
NAME                                FSTYPE LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
...
sdd                                 LVM2_m       QSU1CB-Vxkn-jXah-Rufx-MhiB-IWKu-6z8sN0                
└─ceph--bad12e0b--fe26--44e9--897a--10cfe0ac0d50-osd--block--5268c673--8ffb--4a19--ac9a--c8a49e96a2e2

  └─4ttf0w-cSmK-XB06-Vgum-kWoE-MtTk-lsr1e9

sde                                 LVM2_m       gMrIC4-EL9a-ON47-UIFz-7uDd-RY8U-2bwbxX                
└─ceph--3e3400bb--073b--43a6--9759--0735fb4bf8fd-osd--block--891a74d0--ba9c--4eb3--8a39--8eff473015ec

  └─ebxL39-JSUq-NZYY-NBEC-GuSM-Ai1e-V044mU

sdf                                 LVM2_m       GnUl5A-LI89-qgBB-t2sx-Pj3w-bIxb-UatsxQ                
└─ceph--8ee5f47b--695c--4dda--9dab--1e0b16579f62-osd--block--dac48c85--c1f6--494d--b74b--fae0919810eb

  └─K9vETz-8Afu-rvuL-3YZd-1rd0-aQg5-MjeTwS
...