Open metbog opened 1 year ago
@metbog This seems like a shared vg feature requirement. We had tried shared vg previously but had to shelf the task due to technical roadblocks.
If I understand the requirement correctly, the same PVC is required to be used by applications on two(or more) different nodes, where the underlying PV is a shared vg managed by LVM, and assuming the lock managers required for shared vg are up and running on all worker nodes. I don't see this feasible with current CSI provisioner to provide this kind of rwx capability.
@metbog This seems like a shared vg feature requirement. We had tried shared vg previously but had to shelf the task due to technical roadblocks.
@abhilashshetty04 what was the technical roadblock that we identified?
@orville-wright , We had a previous employee of OpenEBS who attempted this feature. AFAIK, There were some roadblocks due to kernel semaphore dependencies.
This is the PR for your reference. https://github.com/openebs/lvm-localpv/pull/184
If I understand the requirement correctly, the same PVC is required to be used by applications on two(or more) different nodes, where the underlying PV is a shared vg managed by LVM, and assuming the lock managers required for shared vg are up and running on all worker nodes. I don't see this feasible with current CSI provisioner to provide this kind of rwx capability.
Is there another way to achieve this with OpenEBS? For instance, VMware uses VMFS to reattach volumes or disks between VMs. I would like to find a way to use shared storage between nodes, and the only solution I've found so far involves replication. Is there an alternative approach?
Hi @m-czarnik-exa - I run Product Mgmt for openEBS. o.k. lets dig into your use case and figure out some stuff and see if we can help.
openEBS is primarily designed as a Hyper-converged vSAN system. This means that...
Nexus
).
Nexus
is a Block-mode storage Area Network (SAN) and works like a SAN within the cluster. DIskPool
.Operations 5 ... 7
are node exclusive operations. - Only 1 node can safely claim a LV (LUN) device, becasue that block device is presented into the kernel of the node. There is **no way to safely arbitrate
multiple kernels claiming the same (PV) LUN. - i.e. no easy simple way... without the complexity of a clustered kernel block-device subsystem. YES these exist, but they're complex, slow, painful to work with, difficult to manage etc, etc. On-top of this... you would also need a Clustered File System with a distributed lock manger that understands that multiple nodes have physically claimed 1 single LUN and are sharing I/O to the same LUN. This would also require arbitration and a very complex clustered I/O Data-Plane. - YES these exist... but again, they are complex, many are not free/open source and they are horribly complex to deploy and manage.
There are ways to do things that are close to what you are asking for.
You mention VMFS.
[quote:] "VMware uses VMFS to reattach volumes or disks between VMs."
VMware VMFS is a vSAN like layer, but is not a real vSAN.
VMWare vSAN replaced VMFS and is a real Hyper-0Converged SDS vSAN layer... very similar to our NEXUS.
VMware vSAN and our NEXUS Fabric
are similar in that any node in the vSAN fabric can address any block-mode disk device on the fabric. But... only 1 node can safely claim and do I/O to that LUN. - VMFS, VMware vSAN are not clustered block-device system
or a Clustered File System
that allows Multiple nodes to shared mount and shared write to exactly same
single LUN at the same
time.
For openEBS, we have 5 Storage Engines that the user can choose to deploy. Each has different characteristics and different backend Block Allocator kernels. In all of the above... I am referring to openEBS Mayastor (see attached pics). - Not openEBS Local-PV LVM, becasue Mayastor is the only Storage Engine that currently
contains the NEXUS vSAN.
openEBS Local-PV LVM utilizes the LVM2
kernel. (i.e. PE, PV, VG, LV structures) but does not currently utilize the NEXUS vSAN fabric. All I/O is Node-local.
openES Local PV LVM
Our LVM2 kernel is very mature, rock solid and high performance. It does inherit the native LVM2 concept of a Clustered VG (Volume Group) which allows multiples nodes to share access to a 1 single VG. This is somewhat like VMFS or VMware vSAN. You can extend LVM2 to work as in a Clustered LVM mode, but we have not prototyped or tested this.
So... after all of this... as a starting primer... what problem are you trying to solve? when you say the words...
I'm looping in @tiagolobocastro @niladrih and @abhilashshetty04 for any further commentary.
@orville-wright
First of all, thank you for this elaborate explanation :)
Basically what I'm trying to achieve is a migration from VMware CSI to open source virtualization platforms like proxmox/opennebula etc... or possibly a bare metal solution.
The setup that I'm trying to configure (sorry for simplifications but I don't have a deep expertise in the field of storage) is to connect let's say, tree k8s nodes to SAN storage with fc or iscsi (to allocate block storage for these nodes) and be able to attach pv's created on one node, to another node (VMware CSI just reattach vmkd from one vm to another when pod with pvc starts on other node). Taking into account that the SAN storage has configured RAID and that I would like to use velero for backups, I don't need to replicate data from one node to another because it will affect performance. What I'm looking for is CSI that will handle shared storage from disk array between nodes and that will simultaneously be as fast as possible.
@m-czarnik-exa Today it won't be possible to let a PV(Persistent Volume) be used by multiple nodes(or by node different than where PV is created). With LVM-localPV engine, a PV represents the LVM logical volume created on that node.
There is a slightly similar use case of using an LVM VG as shared so that multiple nodes can create PVs on the same VG. This isn't complete yet as discussed and designed here: https://github.com/openebs/lvm-localpv/issues/134
This requirement needs the LVM shared VG support, this will be tracked post #134
Hi there,
We have a Kubernetes cluster (k8s) with BareMetal (BM) workers. These BM workers are connected via Fibre Channel (FC) to PureStorage FA. Our goal is to create a shared volume for our BM workers and use it with lvm-localpv.
PureStorage -> (BM1, BM2) -> /dev/mapper/sharevolume (attached to each BM worker via FC) -> PV -> VG1
Here is StorageClass:
One idea is to make it possible to reattach LVM to any of the BM workers because currently, it creates a Persistent Volume bound to one worker (where it was originally created). This limitation prevents pods from starting on other workers.
Is it possible to achieve this? Perhaps there is already a solution available for this issue?