vmware-archive / vsphere-storage-for-docker

vSphere Storage for Docker
https://vmware.github.io/vsphere-storage-for-docker
Apache License 2.0
251 stars 95 forks source link

Support a multi-writer option for volume create #1532

Open govint opened 7 years ago

govint commented 7 years ago

This is perhaps long pending and easily implemented as a volume create option to set the "multi-writer" flag on a VMDK in the VMX config scsix:y.sharing = "multi-writer". The only requirement is that the vmdk must also be created as a thick disk as dynamic growing is disallowed for a multi-writer disk. Otherwise this VMDK can be shared across VMs.

Also, state for a VMDK with this option set will be ignored and the attached-to field in the KV will be a list of VMs vs. a single VM. VMDK can be removed only if this list is empty.

msterin commented 7 years ago

We already discussed in in #193 . The customer issue there was a regression which we fixed in 0.11. but the issue has been raised there, and the explanation about why we should not do that was also there. What has changed ? Why are bringing it up again ?

To save the click, here is the point from #193:

multi-write VMDKs are used for cluster file systems only and require in-VM cluster software that synchronize access to essential shared block device. There is no use case for enabling this until somebody may decide to put Oracle RAC or CFS into a container without admin access to do it statically. It is super dangerous and pretty much guarantees data corruption if multiwrite is allowed and disk attached / regular filesystem is mounted on 2 VMs.

And yes. you refer to some "clustering" app which could use it. So, if we enable it, which speciific "clustering" app would use it without data corruption ?

Unless there is a clear answer we should close it to minimize distractions

blop commented 7 years ago

What is the suggested option to share a volume across multiple containers hosted on different vm sharing the same datastore ?

Currently we can attach the volume in multiple containers only if those containers are started on the same vm (same swarm node).

govint commented 7 years ago

From my discussions with the team that supports multi-writer disks, these VMDKs can be used simultaneously from multiple VMs (for both read and write IOs). Any collaboration on how the data is managed is up to the applications thats using these disks.

msterin commented 7 years ago

Any collaboration on how the data is managed is up to the applications thats using these disks.

to clarify - only clustered apps doing their own distributed locks will be able to use it (e.g. .Oracle RAC). Any attempt to access these disks as a mounted FS will either fail on mount (if a FS marks superblock as mounted) or will corrupt the FS

@blop - with the current docker-volume-vsphere code, one and only one VM can use the volume at any given time, so containers have to be within this VM to use the volume (can use labels to enforce it I guess).

govint commented 7 years ago

Native multi-writer disks are exactly to be used with apps that manage data level locking, no doubt about that. Can surely be used by apps that do support such capability.

blop commented 7 years ago

I understand, as the vmdk volume created by this plugin is formatted using a traditional file system, it's not meant to be shared across / mounted from multiple machine.

I'm looking for the most simple way to store binary object in the already redundant and distributed vmware vsan that I have. I don't need a filesystem actually. The goal is to be able to scale our application inside a docker swarm and access those binary object from every container.

It's a shame vmware doesn't offer any API to store binary object (S3 like) directly. I could create a vmdk for each object (millions) but I'm not sure it's that scalable.

@msterin Any suggestion on this ?

msterin commented 7 years ago

yeah, S3 API in VSAN directly would be the way to go. Unfortunately I don't know if it's planned.

For now, S3 Object Storage with vSphere Volumes and Minio may be a good workaround, depending on your IO patterns.

blop commented 7 years ago

Yup. I'm evaluating minio too ;-)

I will either use this, or simply setup one VM with a NFS Server (because it seems not possible to run it inside a swarm for now, missing privileged mode).

msterin commented 7 years ago

@blop - yes, server file systems (NFS, CIFS) would still do wonders :-). Samba does not require privileged mode and fails over fine in swarm, you can feed it a docker-volume-plugin volume and this might cover your use case too. We are currently working on"multiple-containers shared access to vSphere storage" using this approach. //CC @pdhamdhere @luomiao