Rook still tries to readd a removed osd, making new osd additions error

sfxworks commented 3 years ago

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

Rook continues to add deployment for osd that was removed manually per guide at https://github.com/rook/rook/blob/master/Documentation/ceph-osd-mgmt.md#purge-the-osd-manually

Expected behavior:

Rook does not remake an OSD deployment for a disk that was purged from the cluster while removeOSDsIfOutAndSafeToRemove set to true.

How to reproduce it (minimal and precise):

Opt to remove an osd manually Follow the instructions in the doc Restart node/mount it with a random filesystem Note rook remakes osd deployment Note that when adding a new node, osd count starts at the removed number, making keys not match. This currently blocks me from adding new osds to the cluster.

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary

spec:
cephVersion:
allowUnsupported: false
image: ceph/ceph:v15.2.7
cleanupPolicy:
allowUninstallWithVolumes: false
confirmation: ""
sanitizeDisks:
  dataSource: zero
  iteration: 1
  method: quick
continueUpgradeAfterChecksEvenIfNotHealthy: false
crashCollector:
disable: false
dashboard:
enabled: true
ssl: true
dataDirHostPath: /var/lib/rook
disruptionManagement:
machineDisruptionBudgetNamespace: openshift-machine-api
manageMachineDisruptionBudgets: false
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
external:
enable: false
healthCheck:
daemonHealth:
  mon:
    disabled: false
    interval: 45s
  osd:
    disabled: false
    interval: 60s
  status:
    disabled: false
    interval: 60s
livenessProbe:
  mgr:
    disabled: false
  mon:
    disabled: false
  osd:
    disabled: false
logCollector: {}
mgr:
modules:
- enabled: true
  name: pg_autoscaler
mon:
allowMultiplePerNode: false
count: 1
monitoring:
enabled: false
rulesNamespace: rook-ceph
placement:
all:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-role.kubernetes.io/storage
          operator: In
          values:
          - ""
  tolerations:
  - key: node-role.kubernetes.io/storage
    operator: Exists
removeOSDsIfOutAndSafeToRemove: true
security:
kms: {}
skipUpgradeChecks: false
storage:
config:
  metadataDevice: nvme1n1
storageClassDeviceSets: null
useAllDevices: true
useAllNodes: true
waitTimeoutForHealthyOSDInMinutes: 10

Operator's logs, if necessary https://pastebin.com/9xrDMCn1

Crashing pod(s) logs, if necessary Old OSD:


kubectl logs -f -n rook-ceph -c activate rook-ceph-osd-4-969dfd7b-vgxvg

OSD_ID=4
OSD_UUID=ad8bbdc2-85e2-4202-b278-45b3f931b5cb
OSD_STORE_FLAG=--bluestore
OSD_DATA_DIR=/var/lib/ceph/osd/ceph-4
CV_MODE=lvm
DEVICE=/dev/ceph-block-dbs-8fb0c077-97b6-4b87-8dab-5c5bd693f931/osd-block-db-7aac0b82-6d1b-4836-984e-7d76a3d6e276
METADATA_DEVICE=
WAL_DEVICE=
[[ lvm == \l\v\m ]] ++ mktemp -d
TMP_DIR=/tmp/tmp.E5Jwqolt78
ceph-volume lvm activate --no-systemd --bluestore 4 ad8bbdc2-85e2-4202-b278-45b3f931b5cb Traceback (most recent call last): File "/usr/sbin/ceph-volume", line 11, in load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')() File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in init self.main(self.argv) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, *kw) File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 151, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 370, in main self.activate(args) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 294, in activate activate_bluestore(lvs, args.no_systemd) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 149, in activate_bluestore raise RuntimeError('could not find a bluestore OSD to activate') RuntimeError: could not find a bluestore OSD to activate

New prepare job for new node: https://pastebin.com/yu3h4SZm Prepares one disk and then errors because of the key issue. Next run it just skips everything https://pastebin.com/3U3xCKSK Leaving the node only partially prepared

[root@pfdc-store-2 ~]# lsblk
NAME                                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                                     8:0    0   7.3T  0 disk
sdb                                                                                                                     8:16   0   7.3T  0 disk
sdc                                                                                                                     8:32   0   7.3T  0 disk
└─ceph--block--66b2e915--19a2--4474--9534--d87bb199eeb2-osd--block--8597ddbf--e2a6--4fd7--85f0--7227515d10ac          254:0    0   7.3T  0 lvm
sdd                                                                                                                     8:48   0 232.9G  0 disk
├─sdd1                                                                                                                  8:49   0     1G  0 part /boot
├─sdd2                                                                                                                  8:50   0  32.1G  0 part
└─sdd3                                                                                                                  8:51   0 199.8G  0 part /
sde                                                                                                                     8:64   0 953.9G  0 disk
sdf                                                                                                                     8:80   0 232.9G  0 disk
nvme1n1                                                                                                               259:0    0 465.8G  0 disk
└─ceph--block--dbs--b1a6209c--59f4--4d74--b1a6--fe87e43e0b8c-osd--block--db--cdb083ae--97a5--4fb8--b797--262520dc3c03 254:1    0  93.1G  0 lvm
nvme0n1                                                                                                               259:1    0 465.8G  0 disk

Environment:

OS (e.g. from /etc/os-release):

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux

and

NAME="Ubuntu"
VERSION="20.10 (Groovy Gorilla)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.10"
VERSION_ID="20.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=groovy
UBUNTU_CODENAME=groovy

Kernel (e.g. uname -a): Linux pfdc-store-2 5.11.21-hardened1-2-hardened #1 SMP PREEMPT Fri, 14 May 2021 21:06:07 +0000 x86_64 GNU/Linux and Linux ryzen1 5.8.0-53-generic #60-Ubuntu SMP Thu May 6 07:46:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration: baremetal
Rook version (use rook version inside of a Rook Pod): rook: v1.5.8 go: go1.13.8
Storage backend version (e.g. for ceph do ceph -v): ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)
Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"windows/amd64"} Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:03:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm

Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

sh-4.4# ceph status
cluster:
id:     d000c358-397c-4737-abd8-f87b2320dda4
health: HEALTH_WARN
        Degraded data redundancy: 11548/7080416 objects degraded (0.163%), 32 pgs degraded, 32 pgs undersized
        32 pgs not deep-scrubbed in time
        32 pgs not scrubbed in time

services:
mon: 1 daemons, quorum a (age 4d)
mgr: a(active, since 17m)
mds: mcshfs1:1 {0=mcshfs1-b=up:active} 1 up:standby-replay
osd: 6 osds: 6 up (since 4d), 6 in (since 2w); 1 remapped pgs

task status:
scrub status:
    mds.mcshfs1-a: idle
    mds.mcshfs1-b: idle

data:
pools:   5 pools, 129 pgs
objects: 3.54M objects, 135 GiB
usage:   1.4 TiB used, 23 TiB / 24 TiB avail
pgs:     11548/7080416 objects degraded (0.163%)
         16/7080416 objects misplaced (0.000%)
         96 active+clean
         32 active+undersized+degraded
         1  active+clean+remapped

io:
client:   47 KiB/s rd, 59 KiB/s wr, 2 op/s rd, 16 op/s wr

travisn commented 3 years ago

@sfxworks Since you have useAllDevices: true and useAllNodes: true, Rook will always scan all the devices and attempt to start OSDs on them.

Did you delete the data from the disk after the OSD was removed? If not, Rook will see the disk previously configured and attempt to start the same OSD again. It will remain in a failed state, however, since the OSD auth was removed. So it is recommended to clean or remove the disk after purging the OSD, or else update the device filter to exclude that disk.

sfxworks commented 3 years ago

The disk was removed and was reformatted long before the addition of this new node. I was also referencing that doc in the reproduction steps. The exception would be that it is still on the node, but since rook skips devices that are formatted I expect this behavior even if the device was previously an osd.

travisn commented 3 years ago

@sfxworks After you purge the bad osd, you don't see that osd's auth from the toolbox with ceph auth ls, right?

From this message there is certainly something leftover from the prior OSD

entity osd.4 exists but key does not match

travisn commented 3 years ago

I was not able to repro this in a test cluster. After purging an OSD, a new OSD was able to be created with the same ID. It is expected that Ceph re-uses the same OSD IDs after they are purged. If you see that everything was purged as expected, you please try the latest Rook v1.6 and Ceph v16 releases to see if it helps.

sfxworks commented 3 years ago

I should also mention there was a metadata device associated with the osd. Unfortunately I cannot test an upgrade as I have switched storage providers. The intent was to report everything I could here before the migration.

sethjones commented 3 years ago

I am seeing a similar issue. @travisn directed me to this issue.

I have previously purged/removed osd.5 continues fail to be reused with new disks.

Always giving me:

debug 2021-06-01T18:00:37.831+0000 7f68beecdf40 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-5/keyring: (2) No such file or directory

debug 2021-06-01T18:00:37.831+0000 7f68beecdf40 -1 AuthRegistry(0x5612b4582940) no keyring found at /var/lib/ceph/osd/ceph-5/keyring, disabling cephx

Prep Steps: 1) Verify osd removal in 'ceph auth' and 'ceph osd' via ceph-tools. 2) Removal of osd deployment within kubernetes 3) Disk detection, and creation appears to work normally. However the deployment fails due to above error.

Whether using a new disk, or a manually cleaned disk, my results are always the same for the disk trying to use the osd.5 id.

sethjones commented 3 years ago

My issue has been resolved. Removing the resource constraints I had placed resolved the disk creation to function normally.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

rook / rook

Rook still tries to readd a removed osd, making new osd additions error #7968