jameshearttech commented 1 year ago

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

Increasing OSD size (resize virtual disks) increases total capacity and used capacity rather than available capacity.

Expected behavior:

Increasing OSD size (resize virtual disks) increases total capacity and available capacity.

Details:

Increased total capacity on 7/11 by 200 GB (50 GB * 4 OSD) from ~400 GB to ~600 GB.
Increased total capacity on 7/13 by 200 GB (50 GB * 4 OSD) from ~600 GB to ~800 GB.

Screenshot:

Image from Grafana shows increases in capacity over the past few days. When total capacity increase you would think available capacity would increase too, but instead used capacity increases.

Environment:

OS (e.g. from /etc/os-release): Ubuntu 22.04.2 LTS
Kernel (e.g. uname -a): Linux dev-master0 5.15.0-70-generic #77-Ubuntu SMP Tue Mar 21 14:02:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration: On-premise, self-managed K8s running on VMWare.
Rook version (use rook version inside of a Rook Pod): v1.11.4
Storage backend version (e.g. for ceph do ceph -v): 17.2.6
Kubernetes version (use kubectl version): v1.26.4
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Vanilla
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_WARN 1 nearfull osd(s); 12 pool(s) nearfull

jameshearttech commented 1 year ago

I have been digging into this the past few hours. I found some other issues related to resizing OSDs.

10930 looks very similar to what I'm seeing. We are using VMWare rather than Proxmox. We are using VMs as K8s nodes with additional virtual disks (raw) used by Ceph as OSD in host-based cluster.

I confirmed the activate initContainer is defined in OSD pod manifest that runs bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-X. This explains why Ceph recognized the change in size of the raw devices, but it doesn't explain why that space is used rathern than available. Must be something I'm missing.

jameshearttech commented 1 year ago

For added context I wanted to mention the process we used to increase the size of the virtual disk Ceph uses for OSD. We performed these steps for each node with an OSD.

Drain node
Shutdown node
Resize virtual disk used by Ceph for OSD
Boot node
Uncordon node

jameshearttech commented 1 year ago

I was able to free 50 GB by running mount | awk '/rbd/ {print $3}' | while read -r MOUNT; do sudo fstrim -v "$MOUNT"; done on each worker node, which is where the OSD disks are located, but after adding 400 GB to the total capacity it seems the available capacity should be higher. At least this resolves the health warnings for the time being, which was preventing some workloads from operating as expected. (edited)

travisn commented 1 year ago

Does ceph osd df in the toolbox show the expected stats, including available?

jameshearttech commented 1 year ago

I don't know what you mean by expected. It reflects what I see in Grafana. My confusion is why the used capacity increased after resizing rather than the available capacity.

Looking back prior to the first resize. Here is the capacity from the cluster dashboard as well as the pools dashboard. You can see the pool usage has not really changed although the cluster used capacity increased. Most of the data is used by ceph-blockpool.

$ kubectl exec rook-ceph-tools-6595b6bb4-4t755 -n rook-ceph -- ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    800 GiB  178 GiB  622 GiB   622 GiB      77.79
TOTAL  800 GiB  178 GiB  622 GiB   622 GiB      77.79

--- POOLS ---
POOL                                 ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                                  1    1  449 KiB        2  1.3 MiB      0     37 GiB
ceph-blockpool                        2   32   73 GiB   20.56k  219 GiB  66.46     37 GiB
ceph-objectstore.rgw.control          3    8      0 B        8      0 B      0     37 GiB
ceph-filesystem-metadata              4   16  206 MiB       82  618 MiB   0.54     37 GiB
ceph-objectstore.rgw.meta             5    8  3.1 KiB        9   77 KiB      0     37 GiB
ceph-filesystem-data0                 6   32    158 B        2   12 KiB      0     37 GiB
ceph-objectstore.rgw.log              7    8  3.5 MiB      340   12 MiB   0.01     37 GiB
ceph-objectstore.rgw.buckets.index    8    8      0 B        0      0 B      0     37 GiB
ceph-objectstore.rgw.buckets.non-ec   9    8      0 B        0      0 B      0     37 GiB
ceph-objectstore.rgw.otp             10    8      0 B        0      0 B      0     37 GiB
.rgw.root                            11    8  4.8 KiB       16  180 KiB      0     37 GiB
ceph-objectstore.rgw.buckets.data    12   32      0 B        0      0 B      0     74 GiB

travisn commented 1 year ago

Ok, thanks for confirming that the Grafana view matches the stats from ceph commands. @satoru-takeuchi Any ideas on why the available is not updated?

satoru-takeuchi commented 1 year ago

@travisn I'm not sure. I'll try to reproduce this problem anyway.

@jameshearttech Could you show me the following information?

cluster.yaml
The result of lsblk and blkid in a node where at least one expanded OSD exists.
The names of block devices correspond to OSDs.

jameshearttech commented 1 year ago

@satoru-takeuchi

cluster.yaml

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{},"labels":{"argocd.argoproj.io/instance":"rook-ceph-cluster"},"name":"rook-ceph","namespace":"rook-ceph"},"spec":{"cephVersion":{"allowUnsupported":false,"image":"quay.io/ceph/ceph:v17.2.6"},"cleanupPolicy":{"allowUninstallWithVolumes":false,"confirmation":"","sanitizeDisks":{"dataSource":"zero","iteration":1,"method":"quick"}},"continueUpgradeAfterChecksEvenIfNotHealthy":false,"crashCollector":{"disable":false},"dashboard":{"enabled":true,"ssl":true},"dataDirHostPath":"/var/lib/rook","disruptionManagement":{"managePodBudgets":true,"osdMaintenanceTimeout":30,"pgHealthCheckTimeout":0},"healthCheck":{"daemonHealth":{"mon":{"disabled":false,"interval":"45s"},"osd":{"disabled":false,"interval":"60s"},"status":{"disabled":false,"interval":"60s"}},"livenessProbe":{"mgr":{"disabled":false},"mon":{"disabled":false},"osd":{"disabled":false}}},"labels":{"monitoring":{"release":"kube-prometheus-stack"}},"logCollector":{"enabled":true,"maxLogSize":"500M","periodicity":"daily"},"mgr":{"allowMultiplePerNode":false,"count":2,"modules":[{"enabled":true,"name":"pg_autoscaler"},{"enabled":true,"name":"rook"}]},"mon":{"allowMultiplePerNode":false,"count":3},"monitoring":{"enabled":true},"network":{"connections":{"compression":{"enabled":false},"encryption":{"enabled":false},"requireMsgr2":false}},"priorityClassNames":{"mgr":"system-cluster-critical","mon":"system-node-critical","osd":"system-node-critical"},"removeOSDsIfOutAndSafeToRemove":false,"resources":{"cleanup":{"limits":{"cpu":"500m","memory":"1Gi"},"requests":{"cpu":"500m","memory":"100Mi"}},"crashcollector":{"limits":{"cpu":"500m","memory":"60Mi"},"requests":{"cpu":"100m","memory":"60Mi"}},"logcollector":{"limits":{"cpu":"500m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"100Mi"}},"mgr":{"limits":{"cpu":"1000m","memory":"1Gi"},"requests":{"cpu":"500m","memory":"512Mi"}},"mgr-sidecar":{"limits":{"cpu":"1000m","memory":"100Mi"},"requests":{"cpu":"100m","memory":"40Mi"}},"mon":{"limits":{"cpu":"2000m","memory":"2Gi"},"requests":{"cpu":"1000m","memory":"1Gi"}},"osd":{"limits":{"cpu":"2000m","memory":"4Gi"},"requests":{"cpu":"1000m","memory":"4Gi"}},"prepareosd":{"requests":{"cpu":"500m","memory":"50Mi"}}},"skipUpgradeChecks":false,"storage":{"nodes":[{"deviceFilter":"^sd[^a]","name":"dev-worker0"},{"deviceFilter":"^sd[^a]","name":"dev-worker1"},{"deviceFilter":"^sd[^a]","name":"dev-worker2"},{"deviceFilter":"^sd[^a]","name":"dev-worker3"}],"useAllDevices":false,"useAllNodes":false},"waitTimeoutForHealthyOSDInMinutes":10}}
  creationTimestamp: "2023-01-16T02:48:04Z"
  finalizers:
  - cephcluster.ceph.rook.io
  generation: 11
  labels:
    argocd.argoproj.io/instance: rook-ceph-cluster
  name: rook-ceph
  namespace: rook-ceph
  resourceVersion: "88821838"
  uid: 9cf3cb98-e774-4d6f-9c7b-ee9ad843b008
spec:
  cephVersion:
    allowUnsupported: false
    image: quay.io/ceph/ceph:v17.2.6
  cleanupPolicy:
    allowUninstallWithVolumes: false
    confirmation: ""
    sanitizeDisks:
      dataSource: zero
      iteration: 1
      method: quick
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  crashCollector:
    disable: false
  dashboard:
    enabled: true
    ssl: true
  dataDirHostPath: /var/lib/rook
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0
  external: {}
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    livenessProbe:
      mgr:
        disabled: false
      mon:
        disabled: false
      osd:
        disabled: false
  labels:
    monitoring:
      release: kube-prometheus-stack
  logCollector:
    enabled: true
    maxLogSize: 500M
    periodicity: daily
  mgr:
    allowMultiplePerNode: false
    count: 2
    modules:
    - enabled: true
      name: pg_autoscaler
    - enabled: true
      name: rook
  mon:
    allowMultiplePerNode: false
    count: 3
  monitoring:
    enabled: true
  network:
    connections:
      compression:
        enabled: false
      encryption:
        enabled: false
      requireMsgr2: false
  priorityClassNames:
    mgr: system-cluster-critical
    mon: system-node-critical
    osd: system-node-critical
  removeOSDsIfOutAndSafeToRemove: false
  resources:
    cleanup:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 100Mi
    crashcollector:
      limits:
        cpu: 500m
        memory: 60Mi
      requests:
        cpu: 100m
        memory: 60Mi
    logcollector:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 100Mi
    mgr:
      limits:
        cpu: 1000m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 512Mi
    mgr-sidecar:
      limits:
        cpu: 1000m
        memory: 100Mi
      requests:
        cpu: 100m
        memory: 40Mi
    mon:
      limits:
        cpu: 2000m
        memory: 2Gi
      requests:
        cpu: 1000m
        memory: 1Gi
    osd:
      limits:
        cpu: 2000m
        memory: 4Gi
      requests:
        cpu: 1000m
        memory: 4Gi
    prepareosd:
      requests:
        cpu: 500m
        memory: 50Mi
  security:
    kms: {}
  skipUpgradeChecks: false
  storage:
    nodes:
    - deviceFilter: ^sd[^a]
      name: dev-worker0
    - deviceFilter: ^sd[^a]
      name: dev-worker1
    - deviceFilter: ^sd[^a]
      name: dev-worker2
    - deviceFilter: ^sd[^a]
      name: dev-worker3
    useAllDevices: false
    useAllNodes: false
  waitTimeoutForHealthyOSDInMinutes: 10
status:
  ceph:
    capacity:
      bytesAvailable: 180592615424
      bytesTotal: 858993459200
      bytesUsed: 678400843776
      lastUpdated: "2023-07-19T03:48:17Z"
    fsid: 2f898171-a628-45ac-b708-b0a199454e3d
    health: HEALTH_OK
    lastChanged: "2023-07-14T21:15:29Z"
    lastChecked: "2023-07-19T03:48:17Z"
    previousHealth: HEALTH_WARN
    versions:
      mds:
        ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 2
      mgr:
        ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 2
      mon:
        ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 3
      osd:
        ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 4
      overall:
        ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 12
      rgw:
        ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable): 1
  conditions:
  - lastHeartbeatTime: "2023-07-19T03:48:17Z"
    lastTransitionTime: "2023-03-09T06:38:18Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready
  message: Cluster created successfully
  observedGeneration: 11
  phase: Ready
  state: Created
  storage:
    deviceClasses:
    - name: ssd
  version:
    image: quay.io/ceph/ceph:v17.2.6
    version: 17.2.6-0

The result of lsblk and blkid in a node where at least one expanded OSD exists.

$ kubectl get node -o wide
NAME          STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
dev-master0   Ready    control-plane   184d   v1.26.4   10.69.2.30    <none>        Ubuntu 22.04.2 LTS   5.15.0-70-generic   containerd://1.6.20
dev-master1   Ready    control-plane   184d   v1.26.4   10.69.2.31    <none>        Ubuntu 22.04.2 LTS   5.15.0-70-generic   containerd://1.6.20
dev-master2   Ready    control-plane   184d   v1.26.4   10.69.2.32    <none>        Ubuntu 22.04.2 LTS   5.15.0-70-generic   containerd://1.6.20
dev-worker0   Ready    <none>          184d   v1.26.4   10.69.2.33    <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.20
dev-worker1   Ready    <none>          184d   v1.26.4   10.69.2.34    <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.20
dev-worker2   Ready    <none>          184d   v1.26.4   10.69.2.35    <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.20
dev-worker3   Ready    <none>          184d   v1.26.4   10.69.2.36    <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.20

$ ssh dev-worker0 lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0  63.4M  1 loop /snap/core20/1950
loop1    7:1    0  63.4M  1 loop /snap/core20/1974
loop2    7:2    0   103M  1 loop /snap/lxd/23541
loop3    7:3    0 111.9M  1 loop /snap/lxd/24322
loop4    7:4    0  53.3M  1 loop /snap/snapd/19361
loop5    7:5    0  53.3M  1 loop /snap/snapd/19457
sda      8:0    0   150G  0 disk
├─sda1   8:1    0     1M  0 part
└─sda2   8:2    0   150G  0 part /var/lib/kubelet/pods/e0cdac41-0916-468a-bbe8-303b88edee86/volume-subpaths/tigera-ca-bundle/calico-node/1
                                 /
sdb      8:16   0   200G  0 disk
sr0     11:0    1   1.4G  0 rom
nbd0    43:0    0     0B  0 disk
nbd1    43:32   0     0B  0 disk
nbd2    43:64   0     0B  0 disk
nbd3    43:96   0     0B  0 disk
nbd4    43:128  0     0B  0 disk
nbd5    43:160  0     0B  0 disk
nbd6    43:192  0     0B  0 disk
nbd7    43:224  0     0B  0 disk
rbd1   252:16   0     5G  0 disk /var/lib/kubelet/pods/43aa363b-f0a9-4014-b717-2832e78fb189/volumes/kubernetes.io~csi/pvc-bfef3b44-c640-43e0-8f01-d36106b6a2e3/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/5fdaa89a02e12d22bb0cf01bb83d9ebe3dd2a254b6d88617e5eab891008c4bfa/globalmount/0001-0009-rook-ceph-0000000000000002-08e3b4c3-ba6c-11ed-8dae-b60c4cfb5373
nbd8    43:256  0     0B  0 disk
nbd9    43:288  0     0B  0 disk
nbd10   43:320  0     0B  0 disk
nbd11   43:352  0     0B  0 disk
nbd12   43:384  0     0B  0 disk
nbd13   43:416  0     0B  0 disk
nbd14   43:448  0     0B  0 disk
nbd15   43:480  0     0B  0 disk

$ ssh dev-worker0 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="f1379f67-ac29-4026-a5d2-a8531fe91729" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="9cc4ea75-0136-4ac5-bc85-bee1278f3702"

$ ssh dev-worker1 lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0  63.4M  1 loop /snap/core20/1950
loop1    7:1    0  63.4M  1 loop /snap/core20/1974
loop2    7:2    0   103M  1 loop /snap/lxd/23541
loop3    7:3    0 111.9M  1 loop /snap/lxd/24322
loop4    7:4    0  53.3M  1 loop /snap/snapd/19361
loop5    7:5    0  53.3M  1 loop /snap/snapd/19457
sda      8:0    0   150G  0 disk
├─sda1   8:1    0     1M  0 part
└─sda2   8:2    0   150G  0 part /var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volume-subpaths/registry-config/registryctl/2
                                 /var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volume-subpaths/registry-config/registryctl/1
                                 /var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volume-subpaths/registry-config/registry/2
                                 /var/lib/kubelet/pods/754c5105-d550-4c44-9d7c-0091e68c78fb/volume-subpaths/jobservice-config/jobservice/0
                                 /var/lib/kubelet/pods/b9498ea3-adba-4bee-a4f7-cc7e6ae720c9/volume-subpaths/portal-config/portal/0
                                 /var/lib/kubelet/pods/2d760d81-2b93-40c6-b6ee-bd47f69a6006/volume-subpaths/config/core/0
                                 /var/lib/kubelet/pods/96d46cfb-8c35-43f2-af81-b1c22284fc9e/volume-subpaths/sc-dashboard-provider/grafana/3
                                 /var/lib/kubelet/pods/96d46cfb-8c35-43f2-af81-b1c22284fc9e/volume-subpaths/config/grafana/0
                                 /var/lib/kubelet/pods/fd336091-7748-4056-aa14-6a49378c5350/volume-subpaths/tigera-ca-bundle/calico-node/1
                                 /
sdb      8:16   0   200G  0 disk
sr0     11:0    1   1.4G  0 rom
nbd0    43:0    0     0B  0 disk
nbd1    43:32   0     0B  0 disk
nbd2    43:64   0     0B  0 disk
nbd3    43:96   0     0B  0 disk
nbd4    43:128  0     0B  0 disk
nbd5    43:160  0     0B  0 disk
nbd6    43:192  0     0B  0 disk
nbd7    43:224  0     0B  0 disk
rbd0   252:0    0    20G  0 disk /var/lib/kubelet/pods/38982b78-2ecf-40a7-9504-43323d5a2a41/volumes/kubernetes.io~csi/pvc-8186fef5-b35d-488b-8fe8-078c73b40c18/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/b290607416970b9aa16ac7514bec10ebb27bac14a2f7f22b41021fae8f79ec9b/globalmount/0001-0009-rook-ceph-0000000000000002-1608b85f-9ce2-11ed-921d-b2cc6687c5e5
rbd1   252:16   0     1G  0 disk /var/lib/kubelet/pods/754c5105-d550-4c44-9d7c-0091e68c78fb/volumes/kubernetes.io~csi/pvc-7d54ef6b-cdf4-4ed0-86cf-76b3cc7818d1/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/06d9d384e80dc91244784c2df3de97eec7dcce7314bae2a41c1e24f3439e7113/globalmount/0001-0009-rook-ceph-0000000000000002-16089f88-9ce2-11ed-921d-b2cc6687c5e5
rbd2   252:32   0     1G  0 disk /var/lib/kubelet/pods/41733a93-3a0c-4849-bc7c-5eb680f14f0d/volume-subpaths/pvc-835afc26-6f0e-49e3-afd0-dc490c339f46/alertmanager/3
                                 /var/lib/kubelet/pods/41733a93-3a0c-4849-bc7c-5eb680f14f0d/volumes/kubernetes.io~csi/pvc-835afc26-6f0e-49e3-afd0-dc490c339f46/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/368173dc48f494675ada2c81a3e06045342f6f40c0cadf2616296bbbe8bc939e/globalmount/0001-0009-rook-ceph-0000000000000002-fe3ec405-aee2-11ed-b0f8-fe71e5eb2c72
rbd3   252:48   0     1G  0 disk /var/lib/kubelet/pods/e129b166-490c-46fb-aac0-28df0a994759/volumes/kubernetes.io~csi/pvc-b1e3b4b5-b94c-4cbf-b041-bca03df87a2f/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/9df84d35fee55d3fab847f439a380ca11ac12c0d871a88c38d853cbf5ca63038/globalmount/0001-0009-rook-ceph-0000000000000002-8dcedc14-590c-4395-9cb9-2f0bd6d9a90d
rbd4   252:64   0     1G  0 disk /var/lib/kubelet/pods/d177c4d0-48d3-428a-b53b-f16a93a2aadd/volumes/kubernetes.io~csi/pvc-6ab95971-82ad-45d1-97c1-09b8966953df/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/86b77a7dcffcd4c380ed29325d4c8a17bcc9efd9866aefc81b56e303465bb8c3/globalmount/0001-0009-rook-ceph-0000000000000002-16bf7737-9ce2-11ed-921d-b2cc6687c5e5
rbd5   252:80   0     5G  0 disk /var/lib/kubelet/pods/c031b293-4fa9-4ada-bd71-98778d4da717/volumes/kubernetes.io~csi/pvc-b58d060d-d655-4f94-be93-cdee4903ed4f/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fd3bbe591a523cb7a9879ff7ecb24dbe29a68fae09a3c31bc9987f4ece04d542/globalmount/0001-0009-rook-ceph-0000000000000002-16a12cf7-9ce2-11ed-921d-b2cc6687c5e5
rbd6   252:96   0   100G  0 disk /var/lib/kubelet/pods/74aa4808-8557-4cc4-b876-7a3852b38483/volumes/kubernetes.io~csi/pvc-08c394ec-92f4-4489-84d5-f8c27894da49/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/c8ed10ef6b340453571615a4bc991aefaf8a14d58d869a221c223b6560887b84/globalmount/0001-0009-rook-ceph-0000000000000002-9197e34e-abdd-11ed-b0f8-fe71e5eb2c72
nbd8    43:256  0     0B  0 disk
nbd9    43:288  0     0B  0 disk
nbd10   43:320  0     0B  0 disk
nbd11   43:352  0     0B  0 disk
nbd12   43:384  0     0B  0 disk
nbd13   43:416  0     0B  0 disk
nbd14   43:448  0     0B  0 disk
nbd15   43:480  0     0B  0 disk

$ ssh dev-worker1 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="04c81207-9e02-4111-a05f-cb29d33c1478" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="e6e10ade-1ff6-4f88-892d-764fde3e4712"

$ ssh dev-worker2 lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0  63.4M  1 loop /snap/core20/1950
loop1    7:1    0  63.4M  1 loop /snap/core20/1974
loop2    7:2    0   103M  1 loop /snap/lxd/23541
loop3    7:3    0 111.9M  1 loop /snap/lxd/24322
loop4    7:4    0  53.3M  1 loop /snap/snapd/19361
loop5    7:5    0  53.3M  1 loop /snap/snapd/19457
sda      8:0    0   150G  0 disk
├─sda1   8:1    0     1M  0 part
└─sda2   8:2    0   150G  0 part /var/lib/kubelet/pods/f2c9842c-d630-42bb-b916-5f809ebbb4f7/volume-subpaths/tigera-ca-bundle/calico-node/1
                                 /
sdb      8:16   0   200G  0 disk
sr0     11:0    1   1.4G  0 rom
nbd0    43:0    0     0B  0 disk
nbd1    43:32   0     0B  0 disk
nbd2    43:64   0     0B  0 disk
nbd3    43:96   0     0B  0 disk
nbd4    43:128  0     0B  0 disk
nbd5    43:160  0     0B  0 disk
nbd6    43:192  0     0B  0 disk
nbd7    43:224  0     0B  0 disk
rbd0   252:0    0     1G  0 disk /var/lib/kubelet/pods/632ec04a-c83d-4fd7-9937-16a8c8b93264/volumes/kubernetes.io~csi/pvc-263ba194-8b87-42dd-a025-03005dd236b5/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/0fc3040d635ec3ea39547c3f25b15327643bb57ec6bd45c8b1823beea7cbff97/globalmount/0001-0009-rook-ceph-0000000000000002-c80cb79a-11dd-485d-8991-8624d0247606
rbd2   252:32   0     1G  0 disk /var/lib/kubelet/pods/46943724-2f44-43aa-b8c5-a665fbdc0d4c/volumes/kubernetes.io~csi/pvc-52ebbd1f-e65e-4c35-8b0e-80eaddd7bf19/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4db5a64aa2e8ccc93a2bf41172e6826e4ca2752c8ee3d040f90e95cc25d40ecb/globalmount/0001-0009-rook-ceph-0000000000000002-16ddcd8e-9ce2-11ed-921d-b2cc6687c5e5
nbd8    43:256  0     0B  0 disk
nbd9    43:288  0     0B  0 disk
nbd10   43:320  0     0B  0 disk
nbd11   43:352  0     0B  0 disk
nbd12   43:384  0     0B  0 disk
nbd13   43:416  0     0B  0 disk
nbd14   43:448  0     0B  0 disk
nbd15   43:480  0     0B  0 disk

$ ssh dev-worker2 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="3e59083a-3247-4db4-b892-7053e14b1fdd" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="356d7430-e148-462c-b8e3-1a1f4820474e"

$ ssh dev-worker3 lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0  63.4M  1 loop /snap/core20/1950
loop1    7:1    0  63.4M  1 loop /snap/core20/1974
loop2    7:2    0   103M  1 loop /snap/lxd/23541
loop3    7:3    0 111.9M  1 loop /snap/lxd/24322
loop4    7:4    0  53.3M  1 loop /snap/snapd/19361
loop5    7:5    0  53.3M  1 loop /snap/snapd/19457
sda      8:0    0   150G  0 disk
├─sda1   8:1    0     1M  0 part
└─sda2   8:2    0   150G  0 part /var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-settings/awx-rsyslog/2
                                 /var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-nginx-conf/awx-web/5
                                 /var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-settings/awx-web/4
                                 /var/lib/kubelet/pods/7a465c49-fb2c-49f4-88c4-1fcc276aa10e/volume-subpaths/awx-redis-config/redis/0
                                 /var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-settings/awx-rsyslog/2
                                 /var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-default-receptor-config/awx-ee/0
                                 /var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-settings/awx-task/4
                                 /var/lib/kubelet/pods/fb7f93f0-1434-4c25-b567-be6349729f63/volume-subpaths/awx-redis-config/redis/0
                                 /var/lib/kubelet/pods/3a2fc86d-1c64-4397-b556-a4582ba00a07/volume-subpaths/tigera-ca-bundle/calico-node/1
                                 /
sdb      8:16   0   200G  0 disk
sr0     11:0    1   1.4G  0 rom
nbd0    43:0    0     0B  0 disk
nbd1    43:32   0     0B  0 disk
nbd2    43:64   0     0B  0 disk
nbd3    43:96   0     0B  0 disk
nbd4    43:128  0     0B  0 disk
nbd5    43:160  0     0B  0 disk
nbd6    43:192  0     0B  0 disk
nbd7    43:224  0     0B  0 disk
rbd0   252:0    0     1G  0 disk /var/lib/kubelet/pods/c1152d8c-823f-4346-903e-ec23b53cc55a/volumes/kubernetes.io~csi/pvc-4f97cf55-ba5d-49bb-b1ce-c0e82d9f93de/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/7c08e4ea9853db9ced3da3dc40b5c3dcdc0faeb00118f1be25ea18b98fc713fb/globalmount/0001-0009-rook-ceph-0000000000000002-1c560875-ce50-4fa5-983a-3bd2f52b4c6c
rbd1   252:16   0    40G  0 disk /var/lib/kubelet/pods/68717bc6-a82c-4120-9866-351ad46bcd09/volume-subpaths/pvc-485f88c4-8c89-4659-937a-96c38daba58b/prometheus/2
                                 /var/lib/kubelet/pods/68717bc6-a82c-4120-9866-351ad46bcd09/volumes/kubernetes.io~csi/pvc-485f88c4-8c89-4659-937a-96c38daba58b/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/7ad18f90d51fbacea8b94f4a495f7affa7cf7d3ad5140919ba96acbe024fe8c1/globalmount/0001-0009-rook-ceph-0000000000000002-a31252f9-a970-11ed-b0f8-fe71e5eb2c72
rbd2   252:32   0     8G  0 disk /var/lib/kubelet/pods/40ab7297-7e7f-474b-838e-9e6eb3f6a5ed/volume-subpaths/pvc-35c1d80d-137f-4e7b-a900-65d29bd403a0/postgres/0
                                 /var/lib/kubelet/pods/40ab7297-7e7f-474b-838e-9e6eb3f6a5ed/volumes/kubernetes.io~csi/pvc-35c1d80d-137f-4e7b-a900-65d29bd403a0/mount
                                 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/43a0d5336bdfeef2796c0ae8e80776fdfb42febdc211aaefe636a74445c3206e/globalmount/0001-0009-rook-ceph-0000000000000002-d071e15f-904a-4774-b990-094ca3ab560d
nbd8    43:256  0     0B  0 disk
nbd9    43:288  0     0B  0 disk
nbd10   43:320  0     0B  0 disk
nbd11   43:352  0     0B  0 disk
nbd12   43:384  0     0B  0 disk
nbd13   43:416  0     0B  0 disk
nbd14   43:448  0     0B  0 disk
nbd15   43:480  0     0B  0 disk

$ ssh dev-worker3 blkid
/dev/sdb: TYPE="ceph_bluestore"
/dev/sr0: BLOCK_SIZE="2048" UUID="2022-08-09-16-48-33-00" LABEL="Ubuntu-Server 22.04.1 LTS amd64" TYPE="iso9660" PTTYPE="PMBR"
/dev/sda2: UUID="fd9b6884-a806-4ffa-aa7a-0bb9197608f4" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="d2b5c6ed-a8ae-439d-94c7-1b042a0022d2"

The names of block devices correspond to OSDs.

Each node has 2 virtual disks. Disk 1 (/dev/sda) is used by the operating system. Disk 2 (/dev/sdb) is used by Ceph for OSD. Disk 2 on each of these 4 worker nodes was resized as described in the issue. Not sure if this is what you were asking about. Please let me know if there is anything else I can provide. Thank you!

jameshearttech commented 1 year ago

@travisn @satoru-takeuchi any ideas on how I can fix the cluster space? I can't do much until I get this resolved. What if I add another virtual disk (/dev/sdc) the same size as disk 2 then remove disk 2 (/dev/sdb)? Would that fix the available space? What is the correct process? Link to docs? Your help is greatly appreciated!

satoru-takeuchi commented 1 year ago

@jameshearttech I've built an VM environmeny yesterday and I'm trying to reproduce this problem. Please wait for a while.

jameshearttech commented 1 year ago

@satoru-takeuchi okay, I will be patient. Thanks.

satoru-takeuchi commented 1 year ago

I couldn't reproduce this problem with Rook v1.11.4 and Ceph v17.2.6. More precisely, When I expanded a disk for OSD from 10 GiB to 20 GiB, both the total capacity of my Ceph cluster and the free space are increased. Since my environment isn't exactly the same as yours, I'll try to emulate your env as possible.

What if I add another virtual disk (/dev/sdc) the same size as disk 2 then remove disk 2 (/dev/sdb)? Would that fix the available space?

It would work. Please try it.

satoru-takeuchi commented 1 year ago

In addition, if you haven't rebooted osd pods after expansion, please get kubectl -n rook-ceph logs rook-ceph-osd-x-yyy expand-bluefs. I'd like to see the log of bluefs-bdev-expand process called in expand-bluefs container.

jameshearttech commented 1 year ago

@satoru-takeuchi this cluster has 30 days log retention. The time range in my first image is from 2023-07-11 00:00:00 to 2023-07-13 06:00:00. I expanded the OSD on each of the worker nodes 2 times during that period, which is evident in the image. I queried Loki from Grafana.

{namespace="rook-ceph", pod=~"rook-ceph-osd-.*", container="expand-bluefs"} |= ``

Here is the log in text format.

satoru-takeuchi commented 1 year ago

@jameshearttech Thank you for more information.

I'll try to emulate your env as possible.

I changed some conditions, but no luck.

What if I add another virtual disk (/dev/sdc) the same size as disk 2 then remove disk 2 (/dev/sdb)? Would that fix the available space?

It would work. Please try it.

How about the progress of this workaround?

As you said, expanding bluefs worked fine, and the additional space became used space from the beginning. IMO, this is not a Rook's problem but a Ceph's problem. If you need more investigation, please submit an issue in the ceph issue tracker.

jameshearttech commented 1 year ago

@satoru-takeuchi @travisn

I was able to fix this by replacing the OSDs. Thanks for everything!

rook / rook

Resizing OSD (Virtual Disks) Does Not Increase Available Capacity #12511

10930 looks very similar to what I'm seeing. We are using VMWare rather than Proxmox. We are using VMs as K8s nodes with additional virtual disks (raw) used by Ceph as OSD in host-based cluster.