Closed rbo closed 7 months ago
Let's enable ceph toolbox - https://access.redhat.com/articles/4628891
oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'
oc -n openshift-storage get pod -l "app=rook-ceph-tools"
NAME READY STATUS RESTARTS AGE
rook-ceph-tools-5bbc55fdf-cv7x2 1/1 Running 0 14s
oc rsh deployment/rook-ceph-tools
..
sh-5.1$ ceph health
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 137 pgs inactive; Degraded data redundancy: 169 pgs undersized
sh-5.1$ ceph status
cluster:
id: 7ec91a51-f0ee-40a6-8d93-5d5c30dc0d67
health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 137 pgs inactive
Degraded data redundancy: 169 pgs undersized
services:
mon: 3 daemons, quorum a,b,c (age 38h)
mgr: a(active, since 38h)
mds: 1/1 daemons up, 1 standby
osd: 3 osds: 3 up (since 38h), 3 in (since 38h); 32 remapped pgs
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 285 objects, 816 MiB
usage: 1.9 GiB used, 1.3 TiB / 1.3 TiB avail
pgs: 81.065% pgs not active
570/855 objects misplaced (66.667%)
137 undersized+peered
32 active+undersized+remapped
io:
client: 12 KiB/s wr, 0 op/s rd, 0 op/s wr
progress:
Global Recovery Event (0s)
[............................]
sh-5.1$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.30980 root default
-3 1.30980 host inf44
0 ssd 0.43660 osd.0 up 1.00000 1.00000
1 ssd 0.43660 osd.1 up 1.00000 1.00000
2 ssd 0.43660 osd.2 up 1.00000 1.00000
sh-5.1$
I only have 3 osd pods?
oc get pods -o wide | grep osd
rook-ceph-osd-0-7479486666-nzjqh 2/2 Running 0 38h 10.128.8.32 inf44 <none> <none>
rook-ceph-osd-1-7f7b6f4bb5-jlv6l 2/2 Running 0 38h 10.128.8.35 inf44 <none> <none>
rook-ceph-osd-2-5d4d48557-p2kjp 2/2 Running 0 38h 10.128.8.36 inf44 <none> <none>
rook-ceph-osd-prepare-03d5d9dea68b6f8184c6b5545ce68586-vjfww 0/1 Completed 0 38h 10.128.8.29 inf44 <none> <none>
rook-ceph-osd-prepare-095fc6277dd39c5d577393f1fe09f7ee-fwvcq 0/1 Completed 0 38h 10.128.16.31 inf7 <none> <none>
rook-ceph-osd-prepare-1113e506af934f35209a9ba2b63ec098-ffcdz 0/1 Completed 0 38h 10.131.8.30 inf8 <none> <none>
rook-ceph-osd-prepare-a607266483fda5b911a3dafbfef670e3-swh82 0/1 Completed 0 38h 10.128.8.30 inf44 <none> <none>
rook-ceph-osd-prepare-c03093c6d966d9c7f13e419da2a780e9-jdsqq 0/1 Completed 0 38h 10.131.8.28 inf8 <none> <none>
rook-ceph-osd-prepare-c736908f8715093a69af797a5b38e6ae-2n6zh 0/1 Completed 0 38h 10.128.8.31 inf44 <none> <none>
rook-ceph-osd-prepare-daf79088fdd9b1b15d2b2478c45155a7-rj9qr 0/1 Completed 0 38h 10.131.8.29 inf8 <none> <none>
rook-ceph-osd-prepare-e51b025211561bab8cb43e1af0f8111e-c6ll2 0/1 Completed 0 38h 10.128.16.30 inf7 <none> <none>
rook-ceph-osd-prepare-e834838f3706213e47a95a581231196f-nqhm8 0/1 Completed 0 38h 10.128.16.32 inf7 <none> <none>
oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-osd-0-7479486666-nzjqh 2/2 Running 0 38h 10.128.8.32 inf44 <none> <none>
rook-ceph-osd-1-7f7b6f4bb5-jlv6`l 2/2 Running 0 38h 10.128.8.35 inf44 <none> <none>
rook-ceph-osd-2-5d4d48557-p2kjp 2/2 Running 0 38h 10.128.8.36 inf44 <none> <none>
=> Let's reinstall ODF
Reinstalled serverall times.
Solution:
oc adm drain --ignore-daemonsets --delete-emptydir-data
dd if=/dev/zero of=/dev/sdX bs=1024 count=1024000
rm -rf /var/lib/rook/
on all storage nodesrm -rf /mnt/local-storage
on all storage nodesODF is running!
Timeout in the pod, readinessProbe fail:
Working env.
ISAR Cluster