Relocation of volumes between datastores

farodin91 commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

Is there a way to migrate CSI vSphere volumes between datastores? We are not using vSan or Tanzu.

One idea would be to use CSI cloning.

Any idea? To help. Is their way to do it manual?

achontzo commented 3 years ago

You can use the standard disk vmotion procedure (Migrate->Change Storage only and enable Configure per disk) as you would normally do with a mounted disk of a Virtual Machine. The tricky part is to identify which is the actual disk so you can then migrate ...

You use a kubectl plugin named vtopology which will map pv name to disk id or you can go the hard way and use powercli for quering vsphere ...

The output of vtopology is like ` === Storage Policy (SPBM) information for PV pvc-e74383ca-fad3-479d-b91e-b283d9e872a0 ===

    Kubernetes VM/Node :  k8s-plus-wrk05-bt.lab.up
    Hard Disk Name     :  Hard disk 30
    Policy Name        :  silver
    Policy Compliance  :  compliant

`

So you can then vmotion the particular disk (e.g: Hard disk 30) to another Datastore.

Hope this helps and also for some official Documentation on actions like these.

farodin91 commented 3 years ago

I would like to have this feature if this correctly what i want. We have two storageclasses with datastore on two different nfs stores.

My plan is migrate between both of storageclasses.

SandeepPissay commented 3 years ago

@farodin91 I'm trying to understand why you want to relocate the volumes from one datastore to another. Is it because you want to decommission the datastore, or is it because you want to balance the capacity between the two datastores?

farodin91 commented 3 years ago

Is it because you want to decommission the datastore, or is it because you want to balance the capacity between the two datastores?

I want to decommission datastores.

achontzo commented 3 years ago

Is it because you want to decommission the datastore, or is it because you want to balance the capacity between the two datastores?

I want to decommission datastores.

Did my solution worked out for you?

farodin91 commented 3 years ago

@achontzo We tried out vmotion and it worked with anartifact on the datastore. Before the fcd was in Directory which was called fcd, now it's in a folder which is called like the originating VM. On the k8s side, we have to manual patch the storageclass.

We started to tried out the CnsRelocate command, but we got a wired error that relocate is disabled.

SandeepPissay commented 3 years ago

We have heard the following storage vMotion requirements for CNS volumes:

Capacity load balancing between storage (could be mixed datastore types like VMFS, NFS, vSAN, vVol).
Datastore maintenance mode support so that all the CNS volumes can be storage vMotioned out of the datastore which will be decommissioned or prepare for firmware upgrade, etc
Storage vMotion volumes from one datastore to another that could be in a different datacenter.

@farodin91 Could you validate if this captures your requirements?

farodin91 commented 3 years ago

@SandeepPissay For our case mainly, 2 and 3 would best match for our requirements.

SandeepPissay commented 3 years ago

@SandeepPissay For our case mainly, 2 and 3 would best match for our requirements.

@farodin91 regarding requirement (3), do you have separate vCenters managing the datacenters or a single vCenter? I'm wondering if we are looking at cross vCenter vMotion.

farodin91 commented 3 years ago

We have just a single vCenter in this case.

marratj commented 3 years ago

@achontzo We tried out vmotion and it worked with anartifact on the datastore. Before the fcd was in Directory which was called fcd, now it's in a folder which is called like the originating VM. On the k8s side, we have to manual patch the storageclass.

We have actually the same issue which bit us pretty hard...

We had Storage DRS vMotion FCDs on our datastore cluster, the VMDKs afterwards being directly in a VM's folder instead of the fcd folder.

We are also using Cluster API and whenever the VM where the VMDK was attached to gets killed (because of an upgrade for example, which provisions new VMs and kills the old ones) the affected PVs are broken and cannot be used as CNS disks anymore.

There needs to be some warning sign somewhere Don't use Storage vMotion/DRS with CNS volumes, or they will break

svrc commented 3 years ago

For what it’s worth , I am curious @marratj what you mean by the PVs getting broken? The PVs are FCDs under the covers and vcenter maintains the link to the VMDK even after it is moved by sVM.

We had tested sVM with TKGI and CSI back in 2019 and had no issues moving PVs across nodes during upgrades. we found that the old PV VMDKs would be in the old VM folder even after that VM was deleted. Perhaps CAPV is doing something odd on VM delete with its attached volumes? the BOSH CPI just does a mass detach of all volumes (BOSH or foreign like a K8s PV) prior to VM deletion.

svrc commented 3 years ago

@SandeepPissay speaking from what I’ve seen in the past, other situations with our BOSH experience with TAS and TKGI

different datastores on different compute clusters, different datacenters (ideally some day with different vcenters)
“shared nothing” cases ie. VSAN / nutanix, where the compute clusters can’t see each other’s Datastores and thus the sVM data transfer happens over the network rather than shared storage
need to handle VM deletion after sVM - ie. move the sVM’d VMDK back to a predictable folder on the datastore (eg. “fcd”) so it can be in a known location rather than stale VM folders

marratj commented 3 years ago

@svrc "broken" means that the CSI driver cannot mount the volume anymore.

(*types.LocalizedMethodFault)(0xc0009abba0)({\n DynamicData: (types.DynamicData) {
\n },\n Fault: (*types.NotFound)(0xc0009abbc0)({\n VimFault: (types.VimFault) {\n MethodFault: (types.MethodFault) {\n FaultCause: (*types.LocalizedMethodFault)(nil),\n FaultMessage: ([]types.LocalizableMessage) nil\n }\n
}\n }),\n LocalizedMessage: (string) (len=50) \"The object or item referred to could not be found.\"\n})\n". opId: "72b115b

The thing is that SDRS moves the VMDK files out of its original fcd folder where it was created into the VM-specific folder on the new datastore it's being migrated to; new datastore, new folder, new VMDK name even (e.g. it gets renamed from fcd/839395e8712e46f285d309818e0eb22f.vmdk to vmname/vmname_2.vmdk during the migration to the new datastore).

We already were in contact with VMware support about this and they confirmed that Storage DRS breaks the CNS/FCD relationship due to this in a way that the CSI driver cannot find the volume anymore and the only way for now is to keep SDRS disabled.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tgelter commented 3 years ago

/remove-lifecycle stale

McAndersDK commented 2 years ago

@marratj did your recover the disk that were moved by DRS ? how?

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tgelter commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neuromantik33 commented 2 years ago

/remove-lifecycle stale

De1taE1even commented 2 years ago

I've recently been dealing with this as well, and was able to get around it, after some troubleshooting. This may not solve others' issues but I wanted to share. In my case, I was getting errors when vCenter tried to detach volumes, and it was due to there being a snapshot of the backing vm that was associated with the mount. As soon as I deleted the snapshot, all my errors went away, the mount detached/attached as intended, and kubernetes was happy again.

omniproc commented 2 years ago

This feature missing renders any realistic vSphere CSI use-case broken. You can't even migrate data to a new datastore when it get's decomissioned. On virtualized infrastructure thats daily operations. @svrc's question is very valid: it's unclear why the CSI isn't able to find migrated FCDs after they have been moved. Wasn't the hole point of FCDs to make VMDKs identifyable by moref/moid/uuid just like any other ManagedObject in the vSphere API? Why are display name (!) paths used to identify the relevant objects for the CSI (node VMs, FCDs)? Would be really interested in the design decision behind that.

gn-smals commented 1 year ago

Is https://github.com/vmware-samples/cloud-native-storage-self-service-manager fixing this problem ? I have the feeling that's the case

divyenpatel commented 1 year ago

Yes, we have CNS Self Service Manager available to help relocate volume from one datastore to another datastore. Refer to

hc2p commented 1 year ago

Actually, this tool leads to the exact same issue with FCDs landing in the wrong folder on the new Datastore https://github.com/vmware-samples/cloud-native-storage-self-service-manager/issues/19

omniproc commented 10 months ago

FYI at least in vSphere 8.x (and maybe in some 7.x U patch too) you can perform a FCD migration right from the UI in the CNS volumes view of the vSphere Cluster. The mentioned cns-self-service tool is not worth your time...

kubernetes-sigs / vsphere-csi-driver

Relocation of volumes between datastores #565