linux-system-roles / storage

Ansible role for linux storage management
https://linux-system-roles.github.io/storage/
MIT License
104 stars 59 forks source link

RFE: Basic support for creating shared logical volumes #341

Closed andyprice closed 10 months ago

andyprice commented 1 year ago

In the new gfs2 role we idempotently create LVs and set them up for shared storage using community.general.* modules to set up PVs as normal and then:

  1. Create the VG, passing the --shared option to vgcreate
  2. Activate the new VG using vgchange --lock-start <VG> and
  3. Create the LV, passing the --activate sy option to lvcreate

We would like to use the storage role for this purpose instead, to avoid bundling modules from community.general into linux-system-roles. The storage role currently does not provide a way to use these options.

The proposal is to add a new shared: (true|false) option for volumes to abstract this functionality in the storage role.

Step 2 is required for step 3 to work, but if step 2 cannot be implemented in the storage role, it should be sufficient for steps 1 and 3 to be supported separately so that the gfs2 role can run step 2 itself.

vojtechtrefny commented 1 year ago

I think this should be relatively simple to implement, most of the changes will be in blivet (the storage library the role uses).

The proposal is to add a new shared: (true|false) option for volumes to abstract this functionality in the storage role.

So would this be for the the volume (in storage role this is the LV) or for the pool (for us this is the VG). If I understand it correctly the VG itself is shared so we'll simply create all volumes (LVs) with the --activate sy option if the VG is shared, so I think it makes sense to add the shared option on the pool (VG) level. Or is it possible to have some LVs in the VG "not shared"?

Few additional questions:

andyprice commented 1 year ago

Or is it possible to have some LVs in the VG "not shared"?

It is possible for an LV in a shared VG to be activated in exclusive mode so that only the first activating node in the cluster can use it. It might make sense to have a shared option for the pool and an optional activation option for the volume which defaults to ay or sy depending on whether the vg is shared. For the gfs2 role, we only use shared LVs.

The "LV activation" section in this doc describes the options: https://man7.org/linux/man-pages/man8/lvmlockd.8.html

What should we do if the VG already exists, but the shared option doesn't "match" the existing VG -- so if user sets shared: true but the VG was not created as shared (or vice versa). Is this simply a user error or can we "convert" the VG to shared (a do we want to support this case)?

In that case there's an assumption in the playbook that doesn't match how the shared storage is being used in the cluster. I would treat it as an error to be safe.

Do we also need to call vgchange --lock-stop when deactivating the VG (for example before removing it)?

As far as I know, removal has to be done in this order:

  1. vgchange --activate n VG/LV on all cluster members [edit: this should have been lvchange, sorry]
  2. lvremove VG/LV on one member
  3. vgchange --lock-stop VG on all-but-one members
  4. vgremove VG on the remaining member

Perhaps we could loop in @teigland on this to check my assertions.

teigland commented 1 year ago

It is possible for an LV in a shared VG to be activated in exclusive mode so that only the first activating node in the cluster can use it. It might make sense to have a shared option for the pool and an optional activation option for the volume which defaults to ay or sy depending on whether the vg is shared. For the gfs2 role, we only use shared LVs.

all the possible options:

  1. vgchange --activate ey VG (activate all LVs in the VG in exclusive mode)
  2. vgchange --activate sy VG (activate all LVs in the VG in shared mode)
  3. vgchange --activate y VG (activate all LVs in the VG in exclusive mode)
  4. lvchange --activate ey VG/LV (activate specific LV in exclusive mode)
  5. lvchange --activate sy VG/LV (activate specific LV in shared mode)
  6. lvchange --activate y VG/LV (activate specific LV in exclusive mode)

In a local VG (not shared), the e|s characters are ignored, and all activation is -ay. In a shared VG, both -ay and -aey mean exclusive, and only -asy means shared.

As far as I know, removal has to be done in this order:

  1. vgchange --activate n VG/LV on all cluster members
  2. lvremove VG/LV on one member
  3. vgchange --lock-stop VG on all-but-one members
  4. vgremove VG on the remaining member

Right, pick one node to do lvremove and vgremove. That node would skip the lockstop which is built into vgremove.

vojtechtrefny commented 1 year ago

Right, pick one node to do lvremove and vgremove. That node would skip the lockstop which is built into vgremove.

So I guess we'll just assume that "someone else" did the lockstop calls and we are only going to do a standard lvremove/vgremove calls.

andyprice commented 1 year ago

For the gfs2 role, that sounds fine. The HA cluster resources will manage the locks in the normal case and we don't support removing the volume groups in the role because that's a destructive operation that the user should consider carefully.

richm commented 10 months ago

Fixed via https://github.com/linux-system-roles/storage/pull/388