Closed jameshearttech closed 1 year ago
I have been digging into this the past few hours. I found some other issues related to resizing OSDs.
I confirmed the activate initContainer is defined in OSD pod manifest that runs bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-X
. This explains why Ceph recognized the change in size of the raw devices, but it doesn't explain why that space is used
rathern than available
. Must be something I'm missing.
For added context I wanted to mention the process we used to increase the size of the virtual disk Ceph uses for OSD. We performed these steps for each node with an OSD.
I was able to free 50 GB by running mount | awk '/rbd/ {print $3}' | while read -r MOUNT; do sudo fstrim -v "$MOUNT"; done
on each worker node, which is where the OSD disks are located, but after adding 400 GB to the total capacity it seems the available capacity should be higher. At least this resolves the health warnings for the time being, which was preventing some workloads from operating as expected. (edited)
Does ceph osd df
in the toolbox show the expected stats, including available?
I don't know what you mean by expected. It reflects what I see in Grafana. My confusion is why the used capacity increased after resizing rather than the available capacity.
Looking back prior to the first resize. Here is the capacity from the cluster dashboard as well as the pools dashboard. You can see the pool usage has not really changed although the cluster used capacity increased. Most of the data is used by ceph-blockpool.
$ kubectl exec rook-ceph-tools-6595b6bb4-4t755 -n rook-ceph -- ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 800 GiB 178 GiB 622 GiB 622 GiB 77.79
TOTAL 800 GiB 178 GiB 622 GiB 622 GiB 77.79
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 449 KiB 2 1.3 MiB 0 37 GiB
ceph-blockpool 2 32 73 GiB 20.56k 219 GiB 66.46 37 GiB
ceph-objectstore.rgw.control 3 8 0 B 8 0 B 0 37 GiB
ceph-filesystem-metadata 4 16 206 MiB 82 618 MiB 0.54 37 GiB
ceph-objectstore.rgw.meta 5 8 3.1 KiB 9 77 KiB 0 37 GiB
ceph-filesystem-data0 6 32 158 B 2 12 KiB 0 37 GiB
ceph-objectstore.rgw.log 7 8 3.5 MiB 340 12 MiB 0.01 37 GiB
ceph-objectstore.rgw.buckets.index 8 8 0 B 0 0 B 0 37 GiB
ceph-objectstore.rgw.buckets.non-ec 9 8 0 B 0 0 B 0 37 GiB
ceph-objectstore.rgw.otp 10 8 0 B 0 0 B 0 37 GiB
.rgw.root 11 8 4.8 KiB 16 180 KiB 0 37 GiB
ceph-objectstore.rgw.buckets.data 12 32 0 B 0 0 B 0 74 GiB
Ok, thanks for confirming that the Grafana view matches the stats from ceph commands. @satoru-takeuchi Any ideas on why the available is not updated?
@travisn I'm not sure. I'll try to reproduce this problem anyway.
@jameshearttech Could you show me the following information?
lsblk
and blkid
in a node where at least one expanded OSD exists.@satoru-takeuchi
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{},"labels":{"argocd.argoproj.io/instance":"rook-ceph-cluster"},"name":"rook-ceph","namespace":"rook-ceph"},"spec":{"cephVersion":{"allowUnsupported":false,"image":"quay.io/ceph/ceph:v17.2.6"},"cleanupPolicy":{"allowUninstallWithVolumes":false,"confirmation":"","sanitizeDisks":{"dataSource":"zero","iteration":1,"method":"quick"}},"continueUpgradeAfterChecksEvenIfNotHealthy":false,"crashCollector":{"disable":false},"dashboard":{"enabled":true,"ssl":true},"dataDirHostPath":"/var/lib/rook","disruptionManagement":{"managePodBudgets":true,"osdMaintenanceTimeout":30,"pgHealthCheckTimeout":0},"healthCheck":{"daemonHealth":{"mon":{"disabled":false,"interval":"45s"},"osd":{"disabled":false,"interval":"60s"},"status":{"disabled":false,"interval":"60s"}},"livenessProbe":{"mgr":{"disabled":false},"mon":{"disabled":false},"osd":{"disabled":false}}},"labels":{"monitoring":{"release":"kube-prometheus-stack"}},"logCollector":{"enabled":true,"maxLogSize":"500M","periodicity":"daily"},"mgr":{"allowMultiplePerNode":false,"count":2,"modules":[{"enabled":true,"name":"pg_autoscaler"},{"enabled":true,"name":"rook"}]},"mon":{"allowMultiplePerNode":false,"count":3},"monitoring":{"enabled":true},"network":{"connections":{"compression":{"enabled":false},"encryption":{"enabled":false},"requireMsgr2":false}},"priorityClassNames":{"mgr":"system-cluster-critical","mon":"system-node-critical","osd":"system-node-critical"},"removeOSDsIfOutAndSafeToRemove":false,"resources":{"cleanup":{"limits":{"cpu":"500m","memory":"1Gi"},"requests":{"cpu":"500m","memory":"100Mi"}},"crashcollector":{"limits":{"cpu":"500m","memory":"60Mi"},"requests":{"cpu":"100m","memory":"60Mi"}},"logcollector":{"limits":{"cpu":"500m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"100Mi"}},"mgr":{"limits":{"cpu":"1000m","memory":"1Gi"},"requests":{"cpu":"500m","memory":"512Mi"}},"mgr-sidecar":{"limits":{"cpu":"1000m","memory":"100Mi"},"requests":{"cpu":"100m","memory":"40Mi"}},"mon":{"limits":{"cpu":"2000m","memory":"2Gi"},"requests":{"cpu":"1000m","memory":"1Gi"}},"osd":{"limits":{"cpu":"2000m","memory":"4Gi"},"requests":{"cpu":"1000m","memory":"4Gi"}},"prepareosd":{"requests":{"cpu":"500m","memory":"50Mi"}}},"skipUpgradeChecks":false,"storage":{"nodes":[{"deviceFilter":"^sd[^a]","name":"dev-worker0"},{"deviceFilter":"^sd[^a]","name":"dev-worker1"},{"deviceFilter":"^sd[^a]","name":"dev-worker2"},{"deviceFilter":"^sd[^a]","name":"dev-worker3"}],"useAllDevices":false,"useAllNodes":false},"waitTimeoutForHealthyOSDInMinutes":10}}
creationTimestamp: "2023-01-16T02:48:04Z"
finalizers:
- cephcluster.ceph.rook.io
generation: 11
labels:
argocd.argoproj.io/instance: rook-ceph-cluster
name: rook-ceph
namespace: rook-ceph
resourceVersion: "88821838"
uid: 9cf3cb98-e774-4d6f-9c7b-ee9ad843b008
spec:
cephVersion:
allowUnsupported: false
image: quay.io/ceph/ceph:v17.2.6
cleanupPolicy:
allowUninstallWithVolumes: false
confirmation: ""
sanitizeDisks:
dataSource: zero
iteration: 1
method: quick
continueUpgradeAfterChecksEvenIfNotHealthy: false
crashCollector:
disable: false
dashboard:
enabled: true
ssl: true
dataDirHostPath: /var/lib/rook
disruptionManagement:
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
external: {}
healthCheck:
daemonHealth:
mon:
disabled: false
interval: 45s
osd:
disabled: false
interval: 60s
status:
disabled: false
interval: 60s
livenessProbe:
mgr:
disabled: false
mon:
disabled: false
osd:
disabled: false
labels:
monitoring:
release: kube-prometheus-stack
logCollector:
enabled: true
maxLogSize: 500M
periodicity: daily
mgr:
allowMultiplePerNode: false
count: 2
modules:
- enabled: true
name: pg_autoscaler
- enabled: true
name: rook
mon:
allowMultiplePerNode: false
count: 3
monitoring:
enabled: true
network:
connections:
compression:
enabled: false
encryption:
enabled: false
requireMsgr2: false
priorityClassNames:
mgr: system-cluster-critical
mon: system-node-critical
osd: system-node-critical
removeOSDsIfOutAndSafeToRemove: false
resources:
cleanup:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 100Mi
crashcollector:
limits:
cpu: 500m
memory: 60Mi
requests:
cpu: 100m
memory: 60Mi
logcollector:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 100Mi
mgr:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
mgr-sidecar:
limits:
cpu: 1000m
memory: 100Mi
requests:
cpu: 100m
memory: 40Mi
mon:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 1000m
memory: 1Gi
osd:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 1000m
memory: 4Gi
prepareosd:
requests:
cpu: 500m
memory: 50Mi
security:
kms: {}
skipUpgradeChecks: false
storage:
nodes:
- deviceFilter: ^sd[^a]
name: dev-worker0
- deviceFilter: ^sd[^a]
name: dev-worker1
- deviceFilter: ^sd[^a]
name: dev-worker2
- deviceFilter: ^sd[^a]
name: dev-worker3
useAllDevices: false
useAllNodes: false
waitTimeoutForHealthyOSDInMinutes: 10
status:
ceph:
capacity:
bytesAvailable: 180592615424
bytesTotal: 858993459200
bytesUsed: 678400843776
lastUpdated: "2023-07-19T03:48:17Z"
fsid: 2f898171-a628-45ac-b708-b0a199454e3d
health: HEALTH_OK
lastChanged: "2023-07-14T21:15:29Z"
lastChecked: "2023-07-19T03:48:17Z"
previousHealth: HEALTH_WARN
versions:
mds:
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 2
mgr:
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 2
mon:
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 3
osd:
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 4
overall:
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 12
rgw:
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 1
conditions:
- lastHeartbeatTime: "2023-07-19T03:48:17Z"
lastTransitionTime: "2023-03-09T06:38:18Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
message: Cluster created successfully
observedGeneration: 11
phase: Ready
state: Created
storage:
deviceClasses:
- name: ssd
version:
image: quay.io/ceph/ceph:v17.2.6
version: 17.2.6-0
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dev-master0 Ready control-plane 184d v1.26.4 10.69.2.30 <none> Ubuntu 22.04.2 LTS 5.15.0-70-generic containerd://1.6.20
dev-master1 Ready control-plane 184d v1.26.4 10.69.2.31 <none> Ubuntu 22.04.2 LTS 5.15.0-70-generic containerd://1.6.20
dev-master2 Ready control-plane 184d v1.26.4 10.69.2.32 <none> Ubuntu 22.04.2 LTS 5.15.0-70-generic containerd://1.6.20
dev-worker0 Ready <none> 184d v1.26.4 10.69.2.33 <none> Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.6.20
dev-worker1 Ready <none> 184d v1.26.4 10.69.2.34 <none> Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.6.20
dev-worker2 Ready <none> 184d v1.26.4 10.69.2.35 <none> Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.6.20
dev-worker3 Ready <none> 184d v1.26.4 10.69.2.36 <none> Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.6.20
$ ssh dev-worker0 lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.4M 1 loop /snap/core20/1950
loop1 7:1 0 63.4M 1 loop /snap/core20/1974
loop2 7:2 0 103M 1 loop /snap/lxd/23541
loop3 7:3 0 111.9M 1 loop /snap/lxd/24322
loop4 7:4 0 53.3M 1 loop /snap/snapd/19361
loop5 7:5 0 53.3M 1 loop /snap/snapd/19457
sda 8:0 0 150G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 150G 0 part /var/lib/kubelet/pods/e0cdac41-0916-468a-bbe8-303b88edee86/volume-subpaths/tigera-ca-bundle/calico-node/1
/
sdb 8:16 0 200G 0 disk
sr0 11:0 1 1.4G 0 rom
nbd0 43:0 0 0B 0 disk
nbd1 43:32 0 0B 0 disk
nbd2 43:64 0 0B 0 disk
nbd3 43:96 0 0B 0 disk
nbd4 43:128 0 0B 0 disk
nbd5 43:160 0 0B 0 disk
nbd6 43:192 0 0B 0 disk
nbd7 43:224 0 0B 0 disk
rbd1 252:16 0 5G 0 disk /var/lib/kubelet/pods/43aa363b-f0a9-4014-b717-2832e78fb189/volumes/kubernetes.io~csi/pvc-bfef3b44-c640-43e0-8f01-d36106b6a2e3/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/5fdaa89a02e12d22bb0cf01bb83d9ebe3dd2a254b6d88617e5eab891008c4bfa/globalmount/0001-0009-rook-ceph-0000000000000002-08e3b4c3-ba6c-11ed-8dae-b60c4cfb5373
nbd8 43:256 0 0B 0 disk
nbd9 43:288 0 0B 0 disk
nbd10 43:320 0 0B 0 disk
nbd11 43:352 0 0B 0 disk
nbd12 43:384 0 0B 0 disk
nbd13 43:416 0 0B 0 disk
nbd14 43:448 0 0B 0 disk
nbd15 43:480 0 0B 0 disk
$ ssh dev-worker0 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="f1379f67-ac29-4026-a5d2-a8531fe91729" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="9cc4ea75-0136-4ac5-bc85-bee1278f3702"
$ ssh dev-worker1 lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.4M 1 loop /snap/core20/1950
loop1 7:1 0 63.4M 1 loop /snap/core20/1974
loop2 7:2 0 103M 1 loop /snap/lxd/23541
loop3 7:3 0 111.9M 1 loop /snap/lxd/24322
loop4 7:4 0 53.3M 1 loop /snap/snapd/19361
loop5 7:5 0 53.3M 1 loop /snap/snapd/19457
sda 8:0 0 150G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 150G 0 part /var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volume-subpaths/registry-config/registryctl/2
/var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volume-subpaths/registry-config/registryctl/1
/var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volume-subpaths/registry-config/registry/2
/var/lib/kubelet/pods/754c5105-d550-4c44-9d7c-0091e68c78fb/volume-subpaths/jobservice-config/jobservice/0
/var/lib/kubelet/pods/b9498ea3-adba-4bee-a4f7-cc7e6ae720c9/volume-subpaths/portal-config/portal/0
/var/lib/kubelet/pods/2d760d81-2b93-40c6-b6ee-bd47f69a6006/volume-subpaths/config/core/0
/var/lib/kubelet/pods/96d46cfb-8c35-43f2-af81-b1c22284fc9e/volume-subpaths/sc-dashboard-provider/grafana/3
/var/lib/kubelet/pods/96d46cfb-8c35-43f2-af81-b1c22284fc9e/volume-subpaths/config/grafana/0
/var/lib/kubelet/pods/fd336091-7748-4056-aa14-6a49378c5350/volume-subpaths/tigera-ca-bundle/calico-node/1
/
sdb 8:16 0 200G 0 disk
sr0 11:0 1 1.4G 0 rom
nbd0 43:0 0 0B 0 disk
nbd1 43:32 0 0B 0 disk
nbd2 43:64 0 0B 0 disk
nbd3 43:96 0 0B 0 disk
nbd4 43:128 0 0B 0 disk
nbd5 43:160 0 0B 0 disk
nbd6 43:192 0 0B 0 disk
nbd7 43:224 0 0B 0 disk
rbd0 252:0 0 20G 0 disk /var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volumes/kubernetes.io~csi/pvc-8186fef5-b35d-488b-8fe8-078c73b40c18/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/b290607416970b9aa16ac7514bec10ebb27bac14a2f7f22b41021fae8f79ec9b/globalmount/0001-0009-rook-ceph-0000000000000002-1608b85f-9ce2-11ed-921d-b2cc6687c5e5
rbd1 252:16 0 1G 0 disk /var/lib/kubelet/pods/754c5105-d550-4c44-9d7c-0091e68c78fb/volumes/kubernetes.io~csi/pvc-7d54ef6b-cdf4-4ed0-86cf-76b3cc7818d1/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/06d9d384e80dc91244784c2df3de97eec7dcce7314bae2a41c1e24f3439e7113/globalmount/0001-0009-rook-ceph-0000000000000002-16089f88-9ce2-11ed-921d-b2cc6687c5e5
rbd2 252:32 0 1G 0 disk /var/lib/kubelet/pods/41733a93-3a0c-4849-bc7c-5eb680f14f0d/volume-subpaths/pvc-835afc26-6f0e-49e3-afd0-dc490c339f46/alertmanager/3
/var/lib/kubelet/pods/41733a93-3a0c-4849-bc7c-5eb680f14f0d/volumes/kubernetes.io~csi/pvc-835afc26-6f0e-49e3-afd0-dc490c339f46/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/368173dc48f494675ada2c81a3e06045342f6f40c0cadf2616296bbbe8bc939e/globalmount/0001-0009-rook-ceph-0000000000000002-fe3ec405-aee2-11ed-b0f8-fe71e5eb2c72
rbd3 252:48 0 1G 0 disk /var/lib/kubelet/pods/e129b166-490c-46fb-aac0-28df0a994759/volumes/kubernetes.io~csi/pvc-b1e3b4b5-b94c-4cbf-b041-bca03df87a2f/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/9df84d35fee55d3fab847f439a380ca11ac12c0d871a88c38d853cbf5ca63038/globalmount/0001-0009-rook-ceph-0000000000000002-8dcedc14-590c-4395-9cb9-2f0bd6d9a90d
rbd4 252:64 0 1G 0 disk /var/lib/kubelet/pods/d177c4d0-48d3-428a-b53b-f16a93a2aadd/volumes/kubernetes.io~csi/pvc-6ab95971-82ad-45d1-97c1-09b8966953df/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/86b77a7dcffcd4c380ed29325d4c8a17bcc9efd9866aefc81b56e303465bb8c3/globalmount/0001-0009-rook-ceph-0000000000000002-16bf7737-9ce2-11ed-921d-b2cc6687c5e5
rbd5 252:80 0 5G 0 disk /var/lib/kubelet/pods/c031b293-4fa9-4ada-bd71-98778d4da717/volumes/kubernetes.io~csi/pvc-b58d060d-d655-4f94-be93-cdee4903ed4f/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fd3bbe591a523cb7a9879ff7ecb24dbe29a68fae09a3c31bc9987f4ece04d542/globalmount/0001-0009-rook-ceph-0000000000000002-16a12cf7-9ce2-11ed-921d-b2cc6687c5e5
rbd6 252:96 0 100G 0 disk /var/lib/kubelet/pods/74aa4808-8557-4cc4-b876-7a3852b38483/volumes/kubernetes.io~csi/pvc-08c394ec-92f4-4489-84d5-f8c27894da49/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/c8ed10ef6b340453571615a4bc991aefaf8a14d58d869a221c223b6560887b84/globalmount/0001-0009-rook-ceph-0000000000000002-9197e34e-abdd-11ed-b0f8-fe71e5eb2c72
nbd8 43:256 0 0B 0 disk
nbd9 43:288 0 0B 0 disk
nbd10 43:320 0 0B 0 disk
nbd11 43:352 0 0B 0 disk
nbd12 43:384 0 0B 0 disk
nbd13 43:416 0 0B 0 disk
nbd14 43:448 0 0B 0 disk
nbd15 43:480 0 0B 0 disk
$ ssh dev-worker1 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="04c81207-9e02-4111-a05f-cb29d33c1478" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="e6e10ade-1ff6-4f88-892d-764fde3e4712"
$ ssh dev-worker2 lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.4M 1 loop /snap/core20/1950
loop1 7:1 0 63.4M 1 loop /snap/core20/1974
loop2 7:2 0 103M 1 loop /snap/lxd/23541
loop3 7:3 0 111.9M 1 loop /snap/lxd/24322
loop4 7:4 0 53.3M 1 loop /snap/snapd/19361
loop5 7:5 0 53.3M 1 loop /snap/snapd/19457
sda 8:0 0 150G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 150G 0 part /var/lib/kubelet/pods/f2c9842c-d630-42bb-b916-5f809ebbb4f7/volume-subpaths/tigera-ca-bundle/calico-node/1
/
sdb 8:16 0 200G 0 disk
sr0 11:0 1 1.4G 0 rom
nbd0 43:0 0 0B 0 disk
nbd1 43:32 0 0B 0 disk
nbd2 43:64 0 0B 0 disk
nbd3 43:96 0 0B 0 disk
nbd4 43:128 0 0B 0 disk
nbd5 43:160 0 0B 0 disk
nbd6 43:192 0 0B 0 disk
nbd7 43:224 0 0B 0 disk
rbd0 252:0 0 1G 0 disk /var/lib/kubelet/pods/632ec04a-c83d-4fd7-9937-16a8c8b93264/volumes/kubernetes.io~csi/pvc-263ba194-8b87-42dd-a025-03005dd236b5/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/0fc3040d635ec3ea39547c3f25b15327643bb57ec6bd45c8b1823beea7cbff97/globalmount/0001-0009-rook-ceph-0000000000000002-c80cb79a-11dd-485d-8991-8624d0247606
rbd2 252:32 0 1G 0 disk /var/lib/kubelet/pods/46943724-2f44-43aa-b8c5-a665fbdc0d4c/volumes/kubernetes.io~csi/pvc-52ebbd1f-e65e-4c35-8b0e-80eaddd7bf19/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4db5a64aa2e8ccc93a2bf41172e6826e4ca2752c8ee3d040f90e95cc25d40ecb/globalmount/0001-0009-rook-ceph-0000000000000002-16ddcd8e-9ce2-11ed-921d-b2cc6687c5e5
nbd8 43:256 0 0B 0 disk
nbd9 43:288 0 0B 0 disk
nbd10 43:320 0 0B 0 disk
nbd11 43:352 0 0B 0 disk
nbd12 43:384 0 0B 0 disk
nbd13 43:416 0 0B 0 disk
nbd14 43:448 0 0B 0 disk
nbd15 43:480 0 0B 0 disk
$ ssh dev-worker2 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="3e59083a-3247-4db4-b892-7053e14b1fdd" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="356d7430-e148-462c-b8e3-1a1f4820474e"
$ ssh dev-worker3 lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.4M 1 loop /snap/core20/1950
loop1 7:1 0 63.4M 1 loop /snap/core20/1974
loop2 7:2 0 103M 1 loop /snap/lxd/23541
loop3 7:3 0 111.9M 1 loop /snap/lxd/24322
loop4 7:4 0 53.3M 1 loop /snap/snapd/19361
loop5 7:5 0 53.3M 1 loop /snap/snapd/19457
sda 8:0 0 150G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 150G 0 part /var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-settings/awx-rsyslog/2
/var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-nginx-conf/awx-web/5
/var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-settings/awx-web/4
/var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-redis-config/redis/0
/var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-settings/awx-rsyslog/2
/var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-default-receptor-config/awx-ee/0
/var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-settings/awx-task/4
/var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-redis-config/redis/0
/var/lib/kubelet/pods/3a2fc86d-1c64-4397-b556-a4582ba00a07/volume-subpaths/tigera-ca-bundle/calico-node/1
/
sdb 8:16 0 200G 0 disk
sr0 11:0 1 1.4G 0 rom
nbd0 43:0 0 0B 0 disk
nbd1 43:32 0 0B 0 disk
nbd2 43:64 0 0B 0 disk
nbd3 43:96 0 0B 0 disk
nbd4 43:128 0 0B 0 disk
nbd5 43:160 0 0B 0 disk
nbd6 43:192 0 0B 0 disk
nbd7 43:224 0 0B 0 disk
rbd0 252:0 0 1G 0 disk /var/lib/kubelet/pods/c1152d8c-823f-4346-903e-ec23b53cc55a/volumes/kubernetes.io~csi/pvc-4f97cf55-ba5d-49bb-b1ce-c0e82d9f93de/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/7c08e4ea9853db9ced3da3dc40b5c3dcdc0faeb00118f1be25ea18b98fc713fb/globalmount/0001-0009-rook-ceph-0000000000000002-1c560875-ce50-4fa5-983a-3bd2f52b4c6c
rbd1 252:16 0 40G 0 disk /var/lib/kubelet/pods/68717bc6-a82c-4120-9866-351ad46bcd09/volume-subpaths/pvc-485f88c4-8c89-4659-937a-96c38daba58b/prometheus/2
/var/lib/kubelet/pods/68717bc6-a82c-4120-9866-351ad46bcd09/volumes/kubernetes.io~csi/pvc-485f88c4-8c89-4659-937a-96c38daba58b/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/7ad18f90d51fbacea8b94f4a495f7affa7cf7d3ad5140919ba96acbe024fe8c1/globalmount/0001-0009-rook-ceph-0000000000000002-a31252f9-a970-11ed-b0f8-fe71e5eb2c72
rbd2 252:32 0 8G 0 disk /var/lib/kubelet/pods/40ab7297-7e7f-474b-838e-9e6eb3f6a5ed/volume-subpaths/pvc-35c1d80d-137f-4e7b-a900-65d29bd403a0/postgres/0
/var/lib/kubelet/pods/40ab7297-7e7f-474b-838e-9e6eb3f6a5ed/volumes/kubernetes.io~csi/pvc-35c1d80d-137f-4e7b-a900-65d29bd403a0/mount
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/43a0d5336bdfeef2796c0ae8e80776fdfb42febdc211aaefe636a74445c3206e/globalmount/0001-0009-rook-ceph-0000000000000002-d071e15f-904a-4774-b990-094ca3ab560d
nbd8 43:256 0 0B 0 disk
nbd9 43:288 0 0B 0 disk
nbd10 43:320 0 0B 0 disk
nbd11 43:352 0 0B 0 disk
nbd12 43:384 0 0B 0 disk
nbd13 43:416 0 0B 0 disk
nbd14 43:448 0 0B 0 disk
nbd15 43:480 0 0B 0 disk
$ ssh dev-worker3 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="fd9b6884-a806-4ffa-aa7a-0bb9197608f4" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="d2b5c6ed-a8ae-439d-94c7-1b042a0022d2"
Each node has 2 virtual disks. Disk 1 (/dev/sda) is used by the operating system. Disk 2 (/dev/sdb) is used by Ceph for OSD. Disk 2 on each of these 4 worker nodes was resized as described in the issue. Not sure if this is what you were asking about. Please let me know if there is anything else I can provide. Thank you!
@travisn @satoru-takeuchi any ideas on how I can fix the cluster space? I can't do much until I get this resolved. What if I add another virtual disk (/dev/sdc) the same size as disk 2 then remove disk 2 (/dev/sdb)? Would that fix the available space? What is the correct process? Link to docs? Your help is greatly appreciated!
@jameshearttech I've built an VM environmeny yesterday and I'm trying to reproduce this problem. Please wait for a while.
@satoru-takeuchi okay, I will be patient. Thanks.
I couldn't reproduce this problem with Rook v1.11.4 and Ceph v17.2.6. More precisely, When I expanded a disk for OSD from 10 GiB to 20 GiB, both the total capacity of my Ceph cluster and the free space are increased. Since my environment isn't exactly the same as yours, I'll try to emulate your env as possible.
What if I add another virtual disk (/dev/sdc) the same size as disk 2 then remove disk 2 (/dev/sdb)? Would that fix the available space?
It would work. Please try it.
In addition, if you haven't rebooted osd pods after expansion, please get kubectl -n rook-ceph logs rook-ceph-osd-x-yyy expand-bluefs
. I'd like to see the log of bluefs-bdev-expand process called in expand-bluefs container.
@satoru-takeuchi this cluster has 30 days log retention. The time range in my first image is from 2023-07-11 00:00:00 to 2023-07-13 06:00:00. I expanded the OSD on each of the worker nodes 2 times during that period, which is evident in the image. I queried Loki from Grafana.
{namespace="rook-ceph", pod=~"rook-ceph-osd-.*", container="expand-bluefs"} |= ``
Here is the log in text format.
@jameshearttech Thank you for more information.
I'll try to emulate your env as possible.
I changed some conditions, but no luck.
What if I add another virtual disk (/dev/sdc) the same size as disk 2 then remove disk 2 (/dev/sdb)? Would that fix the available space?
It would work. Please try it.
How about the progress of this workaround?
As you said, expanding bluefs worked fine, and the additional space became used space from the beginning. IMO, this is not a Rook's problem but a Ceph's problem. If you need more investigation, please submit an issue in the ceph issue tracker.
@satoru-takeuchi @travisn
I was able to fix this by replacing the OSDs. Thanks for everything!
Is this a bug report or feature request?
Deviation from expected behavior:
Expected behavior:
Details:
Screenshot:
Environment:
uname -a
): Linux dev-master0 5.15.0-70-generic #77-Ubuntu SMP Tue Mar 21 14:02:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linuxrook version
inside of a Rook Pod): v1.11.4ceph -v
): 17.2.6kubectl version
): v1.26.4ceph health
in the Rook Ceph toolbox): HEALTH_WARN 1 nearfull osd(s); 12 pool(s) nearfull