Open surajssd opened 4 years ago
Config used to deploy rook component:
component "rook" {}
component "rook-ceph" {
namespace = "rook"
monitor_count = 3
metadata_device = "md127"
}
and the worker config looks like following:
...
worker_pool "pool-1" {
count = 3
node_type = "s1.large.x86"
setup_raid_ssd = true
}
...
And delete the kubelet
DS for reliable working of this component: kubectl -n kube-system delete ds kubelet
Some more traces:
After investigating with @surajssd, we have 2 possible next steps:
ceph-volume
script from the list of available devicesIt seems this is the culprit:
So only device of type disk
or part
can be used, as reported by lsblk
: lsblk --bytes --pairs --output NAME,SIZE,TYPE
NAME="sda" SIZE="480103981056" TYPE="disk"
NAME="sdb" SIZE="480103981056" TYPE="disk"
NAME="sdc" SIZE="126701535232" TYPE="disk"
NAME="sdc1" SIZE="134217728" TYPE="part"
NAME="sdc2" SIZE="2097152" TYPE="part"
NAME="sdc3" SIZE="1073741824" TYPE="part"
NAME="sdc4" SIZE="1073741824" TYPE="part"
NAME="sdc6" SIZE="134217728" TYPE="part"
NAME="sdc7" SIZE="67108864" TYPE="part"
NAME="sdc9" SIZE="124214296064" TYPE="part"
NAME="sdd" SIZE="2000398934016" TYPE="disk"
NAME="sde" SIZE="2000398934016" TYPE="disk"
NAME="sdf" SIZE="2000398934016" TYPE="disk"
NAME="sdg" SIZE="2000398934016" TYPE="disk"
NAME="sdh" SIZE="2000398934016" TYPE="disk"
NAME="sdi" SIZE="2000398934016" TYPE="disk"
NAME="sdj" SIZE="2000398934016" TYPE="disk"
NAME="sdk" SIZE="2000398934016" TYPE="disk"
NAME="sdl" SIZE="2000398934016" TYPE="disk"
NAME="sdm" SIZE="2000398934016" TYPE="disk"
NAME="sdn" SIZE="2000398934016" TYPE="disk"
NAME="sdo" SIZE="2000398934016" TYPE="disk"
NAME="md127" SIZE="959938822144" TYPE="raid0"
NAME="md127" SIZE="959938822144" TYPE="raid0"
NAME="usr" SIZE="1065345024" TYPE="crypt"
This, combined with https://github.com/rook/rook/issues/4999 makes metadataDevice
feature practically useless for production setups, unless you treat nodes as disposable (so if metadata device fails, you lose all the data, and you need to have replica: 2
and do the automatic recovery process). But even then, the setup is limited for the size of single SSD device.
If https://github.com/rook/rook/issues/4999 would be implemented, something like this seems to work correctly:
[root@40bb16746292 /]# ceph-volume lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdd /dev/sdm /dev/sdg /dev/sde /dev/sdl /dev/sdh /dev/sdo /dev/sdk /dev/sdi /dev/sdn /dev/sdj /dev/sdf --db-devices /dev/sda /dev/sdb --report
Total OSDs: 12
Solid State VG:
Targets: block.db Total size: 892.00 GB
Total LVs: 12 Size per LV: 74.33 GB
Devices: /dev/sda, /dev/sdb
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/sdd 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdm 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdg 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sde 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdl 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdh 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdo 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdk 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdi 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdn 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdj 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
----------------------------------------------------------------------------------------------------
[data] /dev/sdf 1.82 TB 100%
[block.db] vg: vg/lv 74.33 GB 8%
Ah, the only problem is, that on Packet, we cannot guarantee, that /dev/sda, /dev/sdb
are the SSDs :(
It seems to be that this is something we should document and close this issue.
I have a cluster on Packet with 3 worker nodes of type
s1.large.x86
. Also a special setting to enable RAID of the SSD drives on the nodesetup_raid_ssd = true
is set on the worker pool. Now the resultant RAID device that is created on the workers is/dev/md127
.Now in the rook-ceph settings this value (
md127
) is provided to the fieldmetadata_device = "md127"
. When the pods that prepare the nodes for rook ceph deployment start as a job they fail with following error:In above machine the SSD drives are:
/dev/sdm
and/dev/sdn
.