EFS don't create multi AZ access in docker swarm mode

Dolbager commented 7 years ago

In docker swarm mode, with latest rexray plugin installed don't create multi-AZ access on EFS. All worker nodes running in different availability zones.

Install on all nodes

docker plugin install rexray/efs --grant-all-permissions EFS_TAG=docker_swarm EFS_SECURITYGROUPS="sg-XXXXXX" EFS_REGION="us-west-2"

And create volume

[root@master01 ~]# docker volume create -d rexray/efs new2
new2

But security groups don't add to availability zones.

If start service like:

docker service create --replicas 3 --name nginx  --publish 80:80  --publish 443:443  --mount type=volume,target=/data,source=new2,volume-driver=rexray/efs --constraint 'node.role == worker' --network web nginx:alpine

security group add to first zone where node run container, bnot in others.

If by hand add security groups to volume, all working fine.

codenrhoden commented 7 years ago

Hi @Dolbager,

From what I can see of the EFS driver, the Security Groups are added to the EFS Mount Target, of which there will be one for each VPC subnet. The security groups are applied when the Mount Target is created, and that happens when an EFS mount is performed on an instance for the first time.

It looks to me that the security groups attached to the instance will always override the security groups defined in the plugin config, according to: https://github.com/codedellemc/libstorage/blob/master/drivers/storage/efs/storage/efs_storage.go#L589.

I also think there is a bug at https://github.com/codedellemc/libstorage/blob/master/drivers/storage/efs/utils/utils.go#L96, where the instance is only reporting it's security groups to libStorage if there is only 1 SG attached to the instance. I think the == there is supposed to be >.

The author of the driver, @mhrabovcin may be able to shed more light on this.

taiidani commented 7 years ago

+1 ing on this one. We have the same Multi-AZ (3 AZs) setup and Rex-ray is not sharing the EFS shares among our nodes. It only creates the first AZ on volume creation.

It works fine if we manually add the remaining 2 AZs, but that's far from an ideal situation.

mhrabovcin commented 7 years ago

Hey @taiidani, the REX-Ray plugin itself doesn't create AZ. As @codenrhoden mentioned the plugin creates MountTarget that allows instances to access EFS from a subnet where the instance runs.

Could you please share more about your rexray configuration and describe AWS environment in which you run your service?

taiidani commented 7 years ago

Sorry @mhrabovcin, by "AZ"s I meant mount points.

We have nodes spread among 3 us-west AZs, with Swarm tasks bouncing between the AZs so that if one fails our services will be rescheduled to the other AZ.

If a service on one AZ gets rescheduled to the other AZ we would expect our original EFS share and data to be available on the new AZ. In practice REX-Ray fails outright -- it does not create a new mount point on the EFS share for the second AZ, nor does it create a new EFS share (which would from the service standpoint wipe all our data). Instead it simply throws an error. In the scenario of an AWS AZ failure this would mean that Swarm cannot reschedule back to the first AZ when REX-Ray rejects the mount, causing downtime for our application.

Manually creating the new mount point before the service is rescheduled allows REX-Ray to find it and mount our existing EFS data.

So basically we're hoping for REX-Ray to create a mount point for the AZ the requesting node is located in, similar to what it does when creating brand new EFS shares.

taiidani commented 7 years ago

From one of the product managers on the REX-Ray team about us having to manually create the extra mount points ourselves.

... your workaround is the only way to do it today. Making multi-AZ for EFS has been on our list of things to do for a while, however we've had other priorities come up and until issues are opened for features, things aren't necessarily prioritized.

ghost commented 7 years ago

I have the same requirement of EC2 instances spread across 3 separate AZs.

I tried: docker run -ti --volume-driver=rexray/efs --rm -v test4:/test busybox

The rexray/efs driver successfully created an EFS filesystem with a single mount point corresponding to the AZ that the EC2 instance was executing from. I was able to successfully mount the EFS filesystem from within that EC2 instance. However if I tried to mount that same EFS filesystem from an instance in a different AZ, using the same command as above, I'd get the following error message (in case anyone is searching for similar):

docker: Error response from daemon: error while mounting volume '/var/lib/docker/plugins/ddf6b9555955ecb1f72817c1c8d24f577a4e6587d6e8a809e81025185ce5f65e/rootfs': VolumeDriver.Mount: {"Error":"resource not found"}.

I've been experimenting with manually creating the EFS file system from scratch with the tag "Name" set to "/", e.g. "/testfs". After doing that, docker volume ls and docker volume inspect both report information about the manually created drive from within an EC2 instance in any AZ. I've then run the same command again docker run -ti --volume-driver=rexray/efs --rm -v test4:/test busybox on two instances, each in different AZs, and have been able to successfully mount the EFS file system and use it as expected.

I'm thinking this will work for my use case because I'm able to create the EFS filesystem with Terraform prior to instance/container creation and rely on a known volume name. May be a workaround for others for the moment.

danvaida commented 6 years ago

I'm also facing the same problem. It's striking to see that this issue has been around for so long. One of the reasons why people choose to use EFS is precisely for getting a distributed, highly-available storage backend. Ideally, one that can be used by equally distributed clients (EC2 nodes in different AZs). As such, I consider this bug to be quite important for the EFS use case. Before manually creating the mount target with its respective security group:

$ rexray volume ls
ID           Name                    Status       Size
fs-c14bb898  vol1  unavailable  12288

Almost instantaneously afterwards:

$ rexray volume ls
ID           Name                    Status    Size
fs-c14bb898  vol1  attached  18432

I understand that the unavailable status is often seen when a normal block device is attached to another instance, however, in the case of EFS, this shouldn't matter as it's basically an NFS share. This issue was particularly hard to debug as there was no clear indication anywhere in the logs.

$ rexray version
REX-Ray
-------
Binary: /usr/bin/rexray
Flavor: client+agent+controller
SemVer: 0.11.2
OsArch: Linux-x86_64
Commit: fc8bfbd2d02c2690fc3a755a9560dd12c88e0852
Formed: Sun, 25 Feb 2018 00:51:22 CET

While I can see @josh-atkins-dev's workaround being enough for some, it would mean that the deployment of a docker stack would not be a matter of simply rolling it out, but also augmenting infrastructure in the background to compensate for this bug. One could also define a placement constraint based on subnets to avoid having failing cross-AZ mount attempts. Perhaps @akutz would be able to weigh in on this one.

LordMike commented 5 years ago

Just chiming in to mention that this issue is still present :(

maxgashkov commented 4 years ago

Is there any estimate on fixing this?

rexray / rexray

EFS don't create multi AZ access in docker swarm mode #771