openebs / lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Apache License 2.0
246 stars 92 forks source link

Support for LV types (RAID0/1/5/10) #164

Open t3hmrman opened 2 years ago

t3hmrman commented 2 years ago

Describe the problem/challenge you have I'd like to be able to use lvm-localpv with lvmraid (LVM native RAID) to take advantage of RAID on a VolumeGroup with multiple drives.

Unfortunately LVM does not allow setting default Logical Volume types on VolumeGroups (this is probably for the best, complexity wise), so when lvm-localpv the only way to enforce a non-default type currently is via thinProvision.

Describe the solution you'd like Support for mirroring and other RAID configurations in StorageClass parameters for lvm-localpv.

Along with support for RAID configurations I'd like support for the --raidIntegrity option to allow for some checksums of data on disk.

Anything else you would like to add: I'd like to work on this, and have a few questions/comments:

Currently I'm using a quite well-known workaround -- running mdraid (via mdadm, with dm-integrity configured) on the two disks below LVM and giving lvm-localpv a volume group based on /dev/mdX. I'd like to test LVM-native RAID against mdraid itself as well ultimately so some support would be great.

Environment:

TProhofsky commented 2 years ago

Another feature I use with LVM2 RAID is --nosync. When using SSDs that guarantee returning zeros for unwritten LBAs and support unmap/trim and lvm.conf has issue_discards=1, then the RAID doesn't need to be synchronized. This saves both time and SSD wear.

nicholascioli commented 11 months ago

I'm going to try to take a stab at this. My plan is to make use of the StorageClass parameters block to allow for setting a raid enable option. Since PersistentVolumeClaims do not allow for setting CSI-specific parameters, my idea is to use annotations to specify the various raid options. A rough snapshot of how I think that it will look like is below:

StorageClass definition:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-lvm
allowVolumeExpansion: true
provisioner: local.csi.openebs.io
parameters:
  storage: "lvm"
  vgpattern: "lvmvg.*"

  allowRaid: true

PersistentVolumeClaim definition:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: csi-lvmpv
  annotations:
    "local.csi.openebs.io/raid": |
      {
        "type": "raid10",
        "integrity": true,
        "mirrors": {
          "count": 2,
          "nosync": true,
        },
        "stripes": {
          "count": 2,
          "size": "64Ki",
        }
      }
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: openebs-lvm
  resources:
    requests:
      storage: 4Gi

The valid options for the annotation are shown below and should be valid JSON:

Option Required Valid Values Description
type true raid0 / stripe, raid / raid1 / mirror, raid5, raid6, raid10 The RAID type of the volume.
integrity false true, false Whether or not to enable DM integrity for the volume. Defaults to false.
mirrors depends Object Configuration of mirrors. Certain RAID configurations require this to be set.
mirrors.count true [0, ∞) How many mirrors to enable.
mirrors.nosync false true, false Whether or not to disable the initial sync. Defaults to false.
stripes depends Object Configuration of stripes. Certain RAID configurations require this to be set.
stripes.count true [0, ∞) How many stripes to enable.
stripes.size false [0, ∞) (but must be a power of 2) The size of each stripe. If not specified, LVM will choose a sane default.

I'll wait a few days before starting, to allow time to discuss changes, but as I really need this on my own cluster I'd like to get started as soon as possible.

TProhofsky commented 11 months ago

I think this approach will work but might blur the roles of folks deploying services. I think of the StorageClasses being owned by the team managing infrastructure and PVC definition owned by the developers or platform teams consuming the provisioned storage. This approach might let a developer creating the yaml for a service or deployment that mucks up the RAID settings because of ignorance to the underlining infrastructure. Having the developers pick from a list of acceptable StorageClasses seems safer. This implies the PVC RAID annotations should be optional Storage Class parameters.

nicholascioli commented 11 months ago

I think this approach will work but might blur the roles of folks deploying services. I think of the StorageClasses being owned by the team managing infrastructure and PVC definition owned by the developers or platform teams consuming the provisioned storage. This approach might let a developer creating the yaml for a service or deployment that mucks up the RAID settings because of ignorance to the underlining infrastructure. Having the developers pick from a list of acceptable StorageClasses seems safer. This implies the PVC RAID annotations should be optional Storage Class parameters.

That's an excellent point. I agree with you, but I wasn't sure if it was OK to tie individual logical volume settings to the entire StorageClass. The method you propose would look like the following. For anyone coming into this later, feel free to vote on the preferred approach by reacting to method you like best with a heart.

With this new approach, the PersistentVolumeClaim would no longer need any extra info, and the owning StorageClass would look as follows (the chart from my previous comment would still apply here):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-lvm
allowVolumeExpansion: true
provisioner: local.csi.openebs.io
parameters:
  storage: "lvm"
  vgpattern: "lvmvg.*"
  raid:
    type: raid10
    integrity: true
    mirrors:
      count: 2
      nosync: true
    stripes:
      count: 2
      size: 64Ki

Does that seem more in line with what you were thinking?

TProhofsky commented 11 months ago

Yes this more like what I was thinking. I'm on the fence with the structure. I like the specificity of the RAID object to match LVM2's approach but maybe it is tied to tight. Might other RAID solutions want to leverage the SC definition with a change to the "storage:" attribute? Also there are other more advanced options like --readahead and --vdo that might be desired. Continually extending and updating the SC definition might become laborious. I would propose keeping the list of defined parameters to those most common to any RAID system and provide a user-beware option to append additional arguments to the lvcreate. Here is an approach and I added some comments for the SC spec:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-lvm
allowVolumeExpansion: true
provisioner: local.csi.openebs.io
parameters:
  storage: "lvm"
  vgpattern: "lvmvg.*"
  # The raidtype parameter is used as the lvcreate --type options.
  # Currently the CSI plug-in supports linear, raid1, raid5, raid6 and raid10. Default is linear.
  raidtype: raid10
  # A true value enables data integrity checksums for raid images if available.
  integrity: true
  # The number of replicas stored on different physical volumes for the PVC.
  mirrors: 2
  # Skip initial synchronization of the storage devices and assume all data has been erase to zero on the device.
  nosync: true
  # Stripe count is the number of devices or physical volumes to be used by the logical volume created for the PVC.
  stripecount: 2
  # Stripe size is the amount of data that is written to one device before moving to the next device.
  # The value must be an integer of bytes optionally followed by a unit multiplier with no space.
  stripesize: 64Ki
  # LVM2 create options such as --vdo and --readahead auto will be applied when the logical volume is created for the PVC.
  # No validation is performed on this option string so any errors or misunderstandings of LVM2 could cause failures to create PVCs.
  lvcreateoptions: "--vdo --readahead auto"

When coding the appending of lvcreateoptions we will need to make sure to strip off special characters like "|" and ";" for security reasons.

nicholascioli commented 11 months ago

Ok, that makes sense! Do we want to group the options under a top-level raid option, like my previous post? Some of the options could be at the top level (like lvCreateOptions), but most of the other options are very specific to RAID. Regardless, I'll start working on this and make a PR. Once we have that, we can discuss specifics there.

t3hmrman commented 11 months ago

Thanks for trying to take this on @nicholascioli -- I am actually not using lvm-localpv anymore but it's great to see someone take a stab at this.

nicholascioli commented 11 months ago

As a quick aside, how should I handle if someone has requested a thinVolume group with RAID? I think that lvm2 does not allow that, as the --type flag does not support multiple types at the same time.

TProhofsky commented 11 months ago

Philosophically, I don't like the idea of the CSI driver trying to enforce LVM2 constraints. What if LVM2 in RedHat 8.8 behaves differently than RedHat 9.2? My opinion is to document that the LVM2 configuration of the OS is the source of true for what options can be used on a give node. When the CSI driver doesn't get good status from the lvcreate call then it should log the lvcreate command arguments and stderr so the stuck PV create can be debugged.