rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.34k stars 2.69k forks source link

Why Metadata-device configuration does not take effect #13016

Closed erictarrence closed 1 year ago

erictarrence commented 1 year ago

What command can be used to determine whether the metadata-device configuration is effective?

sh-4.4$ ceph -s 
  cluster:
    id:     8e633ee3-e904-42b1-a710-1fc07bd4e988
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum b,e,f (age 2w)
    mgr: a(active, since 2w), standbys: b
    osd: 18 osds: 18 up (since 6m), 18 in (since 3h)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    pools:   10 pools, 217 pgs
    objects: 256.21k objects, 109 GiB
    usage:   320 GiB used, 930 GiB / 1.2 TiB avail
    pgs:     217 active+clean

  io:
    client:   3.0 KiB/s rd, 317 KiB/s wr, 2 op/s rd, 0 op/s wr

My metadataDevice configuration is as follows:

  storage:
    config:
      databaseSizeMB: "1024"
      metadataDevice: vde
    deviceFilter: ^vd[b-d]
    useAllDevices: true
    useAllNodes: true

The rook-ceph-osd-prepare pod log has prompted --metadata-device=vde

 kubectl  -n rook-ceph logs rook-ceph-osd-prepare-node164-sdqqp | grep metadata-device
Defaulted container "provision" out of: provision, copy-bins (init)
2023-10-07 06:42:58.849155 I | rookcmd: flag values: --cluster-id=a9a66acc-ee49-4240-b9db-128d23bce9d4, --cluster-name=rook-ceph, --data-device-filter=^vd[b-d], --data-device-path-filter=, --data-devices=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=vde, --node-name=node164, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=1024, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=

However, when using iostat to check the vde disk io, it is always 0. Is there any ceph command that can determine whether the RocksDB and wal logs are written to the vde disk?

iostat -x 5 20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.47    0.00    1.18    0.08    0.00   95.27

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
vdb              0.00    4.00      0.00     16.80     0.00     2.20   0.00  35.48    0.00    0.20   0.00     0.00     4.20   0.45   0.18
vdc              0.20    4.20      4.00     21.60     0.20     2.80  50.00  40.00    0.00    0.19   0.00    20.00     5.14   0.45   0.20
vdd              0.20   57.60      4.00    137.60     0.20     5.60  50.00   8.86    0.00    0.11   0.01    20.00     2.39   0.30   1.76
vde              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
scd0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
rbd0             0.00    1.00      0.00      4.80     0.00     0.20   0.00  16.67    0.00    5.80   0.01     0.00     4.80   5.00   0.50
rbd1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
travisn commented 1 year ago

@erictarrence Did you specify the metadataDevice at first cluster creation? Or was the metadataDevice property specified after the cluster creation? Existing OSDs cannot have a metadataDevice added after creation.

erictarrence commented 1 year ago

@erictarrence Did you specify the metadataDevice at first cluster creation? Or was the metadataDevice property specified after the cluster creation? Existing OSDs cannot have a metadataDevice added after creation.

The test is OK, but there are two problems 1.Why are the weights of newly added OSDs all 0?? Is it related to the "osd crush update on start = false" parameter?

2.The BlockPath of some OSDs is displayed as the device_db path, such as OSD 20 and OSD 22. Does this affect the normal use of ceph?Is it just an error in the information display? Is there any way to modify it?

1.Why are the weights of newly added OSDs all 0?? osd.18-osd.26

sh-4.4$ ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)

sh-4.4$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                  STATUS  REWEIGHT  PRI-AFF
-24         0.29297  root test-ssd-data                                  
 -7         0.14648      host node161                                    
  2    ssd  0.09769          osd.2                  up   1.00000  1.00000
  7    ssd  0.04880          osd.7                  up   1.00000  1.00000
-13         0.14648      host node162                                    
  5    ssd  0.09769          osd.5                  up   1.00000  1.00000
 13    ssd  0.04880          osd.13                 up   1.00000  1.00000
-23         0.61603  root test-hdd                                       
 -5         0.12769      host node163                                    
  1    hdd  0.09769          osd.1                  up   1.00000  1.00000
  9    hdd  0.03000          osd.9                  up   1.00000  1.00000
 -3         0.14648      host node164                                    
  0    hdd  0.09769          osd.0                down         0  1.00000
 15    hdd  0.04880          osd.15               down         0  1.00000
 -9         0.14648      host node165                                    
  4    hdd  0.09769          osd.4                  up   1.00000  1.00000
 10    hdd  0.04880          osd.10                 up   1.00000  1.00000
-11         0.19537      host node166                                    
  3    hdd  0.09769          osd.3                  up   1.00000  1.00000
  6    hdd  0.09769          osd.6                  up   1.00000  1.00000
-22         4.09760  root test-ssd                                       
-28         0.04880      host test-ssd-node161                           
  8    ssd  0.04880          osd.8                  up   1.00000  0.09999
-31         1.00000      host test-ssd-node162                           
 14    ssd  1.00000          osd.14                 up   1.00000  1.00000
-37         1.00000      host test-ssd-node163                           
 11    ssd  1.00000          osd.11                 up   1.00000  1.00000
-34         1.00000      host test-ssd-node164                           
 16    ssd  1.00000          osd.16               down         0  1.00000
-40         0.04880      host test-ssd-node165                           
 12    ssd  0.04880          osd.12                 up   1.00000  1.00000
-43         1.00000      host test-ssd-node166                           
 17    ssd  1.00000          osd.17                 up   1.00000  1.00000
 -1               0  root default                                        
 18    hdd        0  osd.18                         up   1.00000  1.00000
 19    hdd        0  osd.19                         up   1.00000  1.00000
 20    hdd        0  osd.20                         up   1.00000  1.00000
 21    hdd        0  osd.21                         up   1.00000  1.00000
 22    hdd        0  osd.22                         up   1.00000  1.00000
 25    hdd        0  osd.25                         up   1.00000  1.00000
 26    hdd        0  osd.26                         up   1.00000  1.00000

2.The BlockPath of some OSDs is displayed as the device_db path?For example, BlockPath:/dev/ceph-db-164/db-1, this is obviously the device-db path.

2023-10-10 07:10:58.128249 I | cephosd: osdInfo has 2 elements. [{Name:osd-block-a34dfad8-1509-4f35-a5bf-faceb62d76f0 Path:/dev/ceph-5663906b-034f-4e26-86e8-538b2ee08e12/osd-block-a34dfad8-1509-4f35-a5bf-faceb62d76f0 Tags:{OSDFSID:a34dfad8-1509-4f35-a5bf-faceb62d76f0 Encrypted:0 ClusterFSID:8e633ee3-e904-42b1-a710-1fc07bd4e988 CrushDeviceClass:hdd} Type:block} {Name:db-1 Path:/dev/ceph-db-164/db-1 Tags:{OSDFSID:a34dfad8-1509-4f35-a5bf-faceb62d76f0 Encrypted:0 ClusterFSID:8e633ee3-e904-42b1-a710-1fc07bd4e988 CrushDeviceClass:hdd} Type:db}]
2023-10-10 07:10:58.128260 I | cephosd: osdInfo has 2 elements. [{Name:db-0 Path:/dev/ceph-db-164/db-0 Tags:{OSDFSID:67df8613-9b11-42a8-8ec6-c5f34ec8c571 Encrypted:0 ClusterFSID:8e633ee3-e904-42b1-a710-1fc07bd4e988 CrushDeviceClass:hdd} Type:db} {Name:osd-block-67df8613-9b11-42a8-8ec6-c5f34ec8c571 Path:/dev/ceph-f14d7dbf-df6b-4a05-a2f8-2b0a716d681d/osd-block-67df8613-9b11-42a8-8ec6-c5f34ec8c571 Tags:{OSDFSID:67df8613-9b11-42a8-8ec6-c5f34ec8c571 Encrypted:0 ClusterFSID:8e633ee3-e904-42b1-a710-1fc07bd4e988 CrushDeviceClass:hdd} Type:block}]
2023-10-10 07:10:58.128265 I | cephosd: osdInfo has 2 elements. [{Name:osd-block-fc44a59c-a693-43e6-9af1-0fa6a9dc3af9 Path:/dev/ceph-578ad70a-b3df-4ff9-bbc9-d6fb3cf9dbdf/osd-block-fc44a59c-a693-43e6-9af1-0fa6a9dc3af9 Tags:{OSDFSID:fc44a59c-a693-43e6-9af1-0fa6a9dc3af9 Encrypted:0 ClusterFSID:8e633ee3-e904-42b1-a710-1fc07bd4e988 CrushDeviceClass:hdd} Type:block} {Name:db-2 Path:/dev/ceph-db-164/db-2 Tags:{OSDFSID:fc44a59c-a693-43e6-9af1-0fa6a9dc3af9 Encrypted:0 ClusterFSID:8e633ee3-e904-42b1-a710-1fc07bd4e988 CrushDeviceClass:hdd} Type:db}]
2023-10-10 07:10:58.128273 I | cephosd: 3 ceph-volume lvm osd devices configured on this node
2023-10-10 07:10:58.128291 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json
2023-10-10 07:10:58.665098 D | cephosd: {
    "67df8613-9b11-42a8-8ec6-c5f34ec8c571": {
        "ceph_fsid": "8e633ee3-e904-42b1-a710-1fc07bd4e988",
        "device": "/dev/mapper/ceph--f14d7dbf--df6b--4a05--a2f8--2b0a716d681d-osd--block--67df8613--9b11--42a8--8ec6--c5f34ec8c571",
        "device_db": "/dev/mapper/ceph--db--164-db--0",
        "osd_id": 21,
        "osd_uuid": "67df8613-9b11-42a8-8ec6-c5f34ec8c571",
        "type": "bluestore"
    },
    "a34dfad8-1509-4f35-a5bf-faceb62d76f0": {
        "ceph_fsid": "8e633ee3-e904-42b1-a710-1fc07bd4e988",
        "device": "/dev/mapper/ceph--5663906b--034f--4e26--86e8--538b2ee08e12-osd--block--a34dfad8--1509--4f35--a5bf--faceb62d76f0",
        "device_db": "/dev/mapper/ceph--db--164-db--1",
        "osd_id": 20,
        "osd_uuid": "a34dfad8-1509-4f35-a5bf-faceb62d76f0",
        "type": "bluestore"
    },
    "fc44a59c-a693-43e6-9af1-0fa6a9dc3af9": {
        "ceph_fsid": "8e633ee3-e904-42b1-a710-1fc07bd4e988",
        "device": "/dev/mapper/ceph--578ad70a--b3df--4ff9--bbc9--d6fb3cf9dbdf-osd--block--fc44a59c--a693--43e6--9af1--0fa6a9dc3af9",
        "device_db": "/dev/mapper/ceph--db--164-db--2",
        "osd_id": 22,
        "osd_uuid": "fc44a59c-a693-43e6-9af1-0fa6a9dc3af9",
        "type": "bluestore"
    }
}
2023-10-10 07:10:58.665244 D | exec: Running command: lsblk /dev/mapper/ceph--f14d7dbf--df6b--4a05--a2f8--2b0a716d681d-osd--block--67df8613--9b11--42a8--8ec6--c5f34ec8c571 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-10-10 07:10:58.668043 D | sys: lsblk output: "SIZE=\"107369988096\" ROTA=\"1\" RO=\"0\" TYPE=\"lvm\" PKNAME=\"\" NAME=\"/dev/mapper/ceph--f14d7dbf--df6b--4a05--a2f8--2b0a716d681d-osd--block--67df8613--9b11--42a8--8ec6--c5f34ec8c571\" KNAME=\"/dev/dm-6\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-10-10 07:10:58.668073 I | cephosd: setting device class "hdd" for device "/dev/mapper/ceph--f14d7dbf--df6b--4a05--a2f8--2b0a716d681d-osd--block--67df8613--9b11--42a8--8ec6--c5f34ec8c571"
2023-10-10 07:10:58.668089 D | exec: Running command: lsblk /dev/mapper/ceph--5663906b--034f--4e26--86e8--538b2ee08e12-osd--block--a34dfad8--1509--4f35--a5bf--faceb62d76f0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-10-10 07:10:58.670094 D | sys: lsblk output: "SIZE=\"53682896896\" ROTA=\"1\" RO=\"0\" TYPE=\"lvm\" PKNAME=\"\" NAME=\"/dev/mapper/ceph--5663906b--034f--4e26--86e8--538b2ee08e12-osd--block--a34dfad8--1509--4f35--a5bf--faceb62d76f0\" KNAME=\"/dev/dm-5\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-10-10 07:10:58.670110 I | cephosd: setting device class "hdd" for device "/dev/mapper/ceph--5663906b--034f--4e26--86e8--538b2ee08e12-osd--block--a34dfad8--1509--4f35--a5bf--faceb62d76f0"
2023-10-10 07:10:58.670117 D | exec: Running command: lsblk /dev/mapper/ceph--578ad70a--b3df--4ff9--bbc9--d6fb3cf9dbdf-osd--block--fc44a59c--a693--43e6--9af1--0fa6a9dc3af9 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-10-10 07:10:58.672208 D | sys: lsblk output: "SIZE=\"53682896896\" ROTA=\"1\" RO=\"0\" TYPE=\"lvm\" PKNAME=\"\" NAME=\"/dev/mapper/ceph--578ad70a--b3df--4ff9--bbc9--d6fb3cf9dbdf-osd--block--fc44a59c--a693--43e6--9af1--0fa6a9dc3af9\" KNAME=\"/dev/dm-7\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-10-10 07:10:58.672248 I | cephosd: setting device class "hdd" for device "/dev/mapper/ceph--578ad70a--b3df--4ff9--bbc9--d6fb3cf9dbdf-osd--block--fc44a59c--a693--43e6--9af1--0fa6a9dc3af9"
2023-10-10 07:10:58.672258 I | cephosd: 3 ceph-volume raw osd devices configured on this node
2023-10-10 07:10:58.672300 I | cephosd: devices = [{ID:20 Cluster:ceph UUID:a34dfad8-1509-4f35-a5bf-faceb62d76f0 DevicePartUUID: DeviceClass:hdd BlockPath:/dev/ceph-db-164/db-1 MetadataPath: WalPath: SkipLVRelease:false Location:root=default host=node164 LVBackedPV:false CVMode:lvm Store:bluestore TopologyAffinity: Encrypted:false} {ID:21 Cluster:ceph UUID:67df8613-9b11-42a8-8ec6-c5f34ec8c571 DevicePartUUID: DeviceClass:hdd BlockPath:/dev/ceph-f14d7dbf-df6b-4a05-a2f8-2b0a716d681d/osd-block-67df8613-9b11-42a8-8ec6-c5f34ec8c571 MetadataPath: WalPath: SkipLVRelease:false Location:root=default host=node164 LVBackedPV:false CVMode:lvm Store:bluestore TopologyAffinity: Encrypted:false} {ID:22 Cluster:ceph UUID:fc44a59c-a693-43e6-9af1-0fa6a9dc3af9 DevicePartUUID: DeviceClass:hdd BlockPath:/dev/ceph-db-164/db-2 MetadataPath: WalPath: SkipLVRelease:false Location:root=default host=node164 LVBackedPV:false CVMode:lvm Store:bluestore TopologyAffinity: Encrypted:false}]

ceph config infromation:

    - devices:
      - config:
          metadataDevice: /dev/ceph-db-164/db-0
        name: vdb
      - config:
          metadataDevice: /dev/ceph-db-164/db-1
        name: vdc
      - config:
          metadataDevice: /dev/ceph-db-164/db-2
        name: vdd
      name: node164
      useAllDevices: true
travisn commented 1 year ago

1.Why are the weights of newly added OSDs all 0?? Is it related to the "osd crush update on start = false" parameter? 1.Why are the weights of newly added OSDs all 0?? osd.18-osd.26

This is common after the OSD prepare job is completed with the provisioning, but before the OSD daemon pods have started successfully for the first time. Do you not have new OSD pods running for osd.18-26? After the OSD daemon pods start, the OSDs should show up in the expected place in the OSD tree.

2.The BlockPath of some OSDs is displayed as the device_db path, such as OSD 20 and OSD 22. Does this affect the normal use of ceph?Is it just an error in the information display? Is there any way to modify it?

@satoru-takeuchi Can you take a look?

satoru-takeuchi commented 1 year ago

@travisn Got it.

erictarrence commented 1 year ago

1.Why are the weights of newly added OSDs all 0?? Is it related to the "osd crush update on start = false" parameter? 1.Why are the weights of newly added OSDs all 0?? osd.18-osd.26

This is common after the OSD prepare job is completed with the provisioning, but before the OSD daemon pods have started successfully for the first time. Do you not have new OSD pods running for osd.18-26? After the OSD daemon pods start, the OSDs should show up in the expected place in the OSD tree.

After testing, it was determined that the "osd crush update on start = false" parameter caused the WEIGHT of all newly added osds to be 0. Is there any way to automatically set the WEIGHT of osd under "osd crush update on start = false"?

satoru-takeuchi commented 1 year ago

Is there any way to automatically set the WEIGHT of osd under "osd crush update on start = false"?

There is no such way. By setting this parameter, the CRUSH location of new osds should be set by users. In your case, the weights of new osds 18-22, 25,26 is 0 because these osds are not under CRUSH map.

After setting the locations, these weights will be 1

Since the location process is the responsibility of the users, if you want to automatically set the weights to 1, the location process should be automated by yourself.

IMO, in most cases, this parameter is not necessary. Please remove this configuration if possible.

erictarrence commented 1 year ago

Is there any way to automatically set the WEIGHT of osd under "osd crush update on start = false"?

There is no such way. By setting this parameter, the CRUSH location of new osds should be set by users. In your case, the weights of new osds 18-22, 25,26 is 0 because these osds are not under CRUSH map.

After setting the locations, these weights will be 1

Since the location process is the responsibility of the users, if you want to automatically set the weights to 1, the location process should be automated by yourself.

IMO, in most cases, this parameter is not necessary. Please remove this configuration if possible.

If delete the "osd crush update on start = false" configuration

After the cluster is powered off and restarted, all customized crushes will disappear and the OSD will revert to the default crush, causing business and data failures.

Is there any way to save a custom crush without setting the "osd crush update on start = false" parameter? Thanks

I know of the following method of customizing "osd crush location", but it is too cumbersome and difficult to maintain.

[osd.0]
osd crush location = "host=data-ssd"
[osd.1]
osd crush location = "host=data2-ssd"
satoru-takeuchi commented 1 year ago

@erictarrence

Is there any way to save a custom crush without setting the "osd crush update on start = false" parameter? Thanks

Unfortunately, I have no idea. At least it's impossible in Rook's layer. Could you submit this question to ceph-users ML?

erictarrence commented 1 year ago

@erictarrence

Is there any way to save a custom crush without setting the "osd crush update on start = false" parameter? Thanks

Unfortunately, I have no idea. At least it's impossible in Rook's layer. Could you submit this question to ceph-users ML?

Alright, I got it,Thanks