xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.26k stars 74 forks source link

Feature Request - Support For Storage Cluster #644

Open navnair opened 5 months ago

navnair commented 5 months ago

Is your feature request related to a problem? Please describe. Grouping several HBA-based SRs is currently not supported. A storage cluster or pool that groups numerous SRs will be extremely useful for optimal storage management.

Describe the solution you'd like Implementation of a 'Storage Cluster' concept, which allows for the pooling of several SRs into a single repository and provides functionality similar to VMware SDRS.

Describe alternatives you've considered With our POC and planned Lab migration, we've chosen to deploy larger SRs to address this.

Additional context

olivierlambert commented 5 months ago

Can you describe more precisely what are you expecting in terms of visible features?

navnair commented 5 months ago

My organization's storage team provisions HBA luns in 12TiB units.This is not an ideal case since you may wind up with multiple storage devices with varying levels of used space, and each time you deploy a new VM, you may need to first determine which storage device has enough capacity to place the VM. Due to such scenarios and the number of virtual machines we host, we always rely on VMware's storage cluster functionality, so when we need to enlarge the available space in the datastore, we request a new Lun and add it to the storage cluster. This is efficient in terms of storage management, and VMware also includes Storage DRS, which balances usage across all Luns in a datastore cluster. In the case of xcp-ng, I am expecting something similar in that I will be able to aggregate many HBA SRs into a single storage cluster/pool. So, for any VM placements or OVA/disk imports, I can simply select the storage cluster (pool) and the system will determine the best lun/SR to place the VM. A reference link that takes about Vmware SDRS: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-598DF695-107E-406B-9C95-0AF961FC227A.html

olivierlambert commented 5 months ago

Thanks, so let me rephrase in terms of functional requirements: at VM creation (or OVA/disk import), having a system that will automatically deal with a "group of SRs" in the same pool to spread the disks at the right SR depending on placement rules ("spread the use between all SRs in the group" seems to be the best approach, right?)

Is that what you need?

navnair commented 5 months ago

Thats correct. Thanks !

olivierlambert commented 5 months ago

Okay great! We'll need to put that in our backlog. We'll discuss on the best way to do it (in XCP-ng directly or via a new abstraction in Xen Orchestra).

If you want to raise the priority for this, feel free to contact us at https://vates.tech/contact

nagilum99 commented 5 months ago

If you read the VMware page, it looks like it's basically a load balancer acting as SR. It can migrate the storage, depending on load and usage of the underlaying storage pod.

Citrix has(had?) a load balancer VM for that, so I assume that functionality would rather be placed into XO(A) than XCP-ng, as it needs active monitoring. The documentation also doesn't read as better storage usage optimization, as that would need a single storage (VHD) to be split-up between differen SRs - which over time could fragment a single VHD over several SRs and end up in a huge mess without proper management.

Something more easy could be an SR-group or an "auto-place" function, when import/creating a VM. But you already get an SR suggested, that's big enough?

navnair commented 5 months ago

@nagilum99 @olivierlambert

Let me try to explain the things I'm looking for in the Vates stack (whether in XOA or xcp-ng); some of these points were taken from VMware documentation but condensed to make it a less dull read.

Vmware Storage DRS is indeed a feature of Vcenter, Vcenter is required to perform all management and optimizations to work. It's similar to the "LB plugin" functionality in XOA, but for storage management.

@nagilum99 Regarding your statement about splitting VHD's between Mutiple SR's, that's not true with Storage DRS. Storage DRS does not split a single VMDK/VHD into multiple fragments when performing a placement. It operates at the level of entire VMDK files, not fragments of them. In short, if it requires to move a VMDK to another datastore/SR for space management, then it moves the entire VMDK file. Now for the VM, its disks might be spaced between multiple datastores/SRs; but its not going to impact the performance.

Base features:

Optional Features:

nagilum99 commented 5 months ago

@navnair: That's whas I said. But not splitting the VMDKs means it can't utilize 100 % of each SR (like a JBOD). It's basically just an autobalancer between multiple SRs (appearing as an SR-target itself).

It's probably a feature for XO-SAN.

navnair commented 5 months ago

@nagilum99 : Well, we are not seeking to maximise utilisation on all SRs, but rather to make overall storage management easier and more balanced. The storage DRS is customisable with user thresholds, allowing us to choose how much datastore/SR should be used. And, if you rely on external backup systems to execute snapshot-based backups for VMs, it is typically advised that you leave 5-10% of space free in each datastore/SR to allow for proper snapshotting and backup.

As far as I am aware, XOSAN is identical to VMware's VSAN and requires HCI equipment. Although HCI systems are part of the ecosystem, the most majority still rely on solitary blades or DLs with only a USB stick or a two-disk RAID 1 boot drive, which are then connected to external storage frames for the actual workload requirements. As a result, the suggested functionality would be extremely helpful to larger organizations that have invested in such systems.

nagilum99 commented 5 months ago

@navnair: In the end XOSAN uses disks placed on $storage, which can be local HDDs but also SAN. I don't see why it couldn't utilize NFS or alike to span an SR (didn't check if it works with NFS too), instead of passing through to any /dev/$device. (technically it could use LVM and add storages to it)