scylladb / scylla-ansible-roles

Ansible roles for deploying and managing Scylla, Scylla-Manager and Scylla-Monitoring
43 stars 38 forks source link

`detect_nvmes` feature doesn't work for AWS #407

Open igorribeiroduarte opened 2 weeks ago

igorribeiroduarte commented 2 weeks ago

AWS instances have an extra NVME for the boot disk, so when you enable detect_nvmes on a cluster with AWS instances, the node role will try to also use this extra NVME, since by default we try to use all the NVMEs present in the instance. We should change the role to only use NVMEs that are not already being used.

CC: @tarzanek

vreniers commented 1 week ago

The problem is a bit different than described. There is no "extra" NVMe on AWS instances.

When you use an AWS Instance that has local NVMe disks - or what they refer to as instance store volumes - there will also be a volume that shows up as NVMe but however isn't. This is the root/boot volume which is in fact an EBS volume, however when you run lsblk, it will show up as if it is an NVMe.

Some more information here: https://docs.aws.amazon.com/ebs/latest/userguide/nvme-ebs-volumes.html

This is even the fact when you add additional EBS volumes to the disk, they show up as NVMe. Here's an example of an i4i.large where I set the root volume (EBS) to 30GB and also attached an EBS volume of 40GB.

nvme1n1      259:0    0    30G  0 disk 
├─nvme1n1p1  259:3    0  29.9G  0 part /
├─nvme1n1p14 259:4    0     4M  0 part 
└─nvme1n1p15 259:5    0   106M  0 part /boot/efi
nvme2n1      259:1    0 435.9G  0 disk 
nvme0n1      259:2    0    40G  0 disk 

There is a way to filter out which are really the NVMe, the shown serial using lsblk -o +SERIAL with EBS volumes shows the volume id, as explained here: https://docs.aws.amazon.com/ebs/latest/userguide/identify-nvme-ebs-device.html.

Either way, it would be a good start to indeed not use any "NVMe" that are not in use. I don't see an immediate way to filter it out using the nvme command, so perphaps lsblk is the way to go.