mutagen-io / mutagen

Fast file synchronization and network forwarding for remote development
https://mutagen.io
Other
3.45k stars 154 forks source link

Docker for Mac Mutagen Refugee Discussion #235

Open xenoscopic opened 4 years ago

xenoscopic commented 4 years ago

With the removal of Mutagen from Docker for Mac Edge, there's been a lot of interest in continued use of Mutagen for development. While development of gRPC-FUSE continues, I'd like to offer a workaround to developers looking to emulate the previous Docker for Mac Edge functionality. This workaround involves manually creating Docker volumes and Mutagen synchronization sessions. While this isn't as elegant as the Docker for Mac UI, it does provide more granular control over synchronization.

This isn't the only way to use Mutagen with Docker for Mac, it's just an initial workaround proposal. If users have additional setups they'd like to share (e.g. using an SSH sidecar container), please add them below! As @driq already pointed out, Mutagen's Compose integration offers another way to automate this caching.

Please feel free to email me at jacob@mutagen.io if you'd like to discuss your specific setup and needs via email or video chat.

Also, if you'd be interested in a tool that automates this (maybe even a simple GUI), please let me know below!

Install Mutagen

First, you'll need to install Mutagen. If you're on macOS, the best and easiest way to do this is with Homebrew:

brew install mutagen-io/mutagen/mutagen

Create a volume for caching files

Now, for each directory that you want to sync/cache inside the Docker for Mac VM, you'll need to create a named volume:

docker volume create mycache

Create a container to access the volume

In order for Mutagen to access the volume, you'll need to create a container with the volume mounted. You can use the Mutagen sidecar container for this purpose (it's just a no-op entry point):

docker container create --name mycachecontainer -v mycache:/volumes/mycache mutagenio/sidecar
docker container start mycachecontainer

The /volumes directory already exists inside the Mutagen sidecar container, so it makes a good location to create mountpoints.

You can use one sidecar container per volume, or one sidecar container for multiple volumes. The only disadvantage with the latter approach is that you have to mount volumes at container creation time, which makes adding volumes later tricky.

(Optional) Change volume ownership/permissions

Docker volumes default to root ownership with rwxr-xr-x permissions. If you'd like to change that, now is a good time to do that. You can use something like the following command(s):

docker exec mycachecontainer chown 1001:1001 /volumes/mycache
docker exec mycachecontainer chmod go+w /volumes/mycache

The Mutagen sidecar container runs as root, so you can change ownership to any UID/GID you like, I've just chosen 1001 for this example. Your exact needs will vary depending on the containers you're using. If you run your containers as root, this may not matter at all.

Create a Mutagen synchronization session

Finally, you'll want to create a Mutagen synchronization session to sync files into the volume. This will look something like the following:

mutagen sync create [options] <path> docker://mycachecontainer/volumes/mycache

A more concrete example might look something like:

mutagen sync create --name mycache --sync-mode=two-way-resolved --ignore-vcs --ignore 'node_modules/' --default-owner-beta="id:1001" ~/Projects/myapp docker://mycachecontainer/volumes/mycache

Mutagen synchronization sessions have a lot of options that you can use to customize behavior. For a full list, see mutagen sync create --help. I would recommend reading the section on synchronization sessions in the Mutagen documentation, as well as the sections on ignores, version control systems, and permissions.

The most important takeaways are the following:

sync:
  defaults:
    ignore:
      vcs: true
      paths:
        - ".DS_Store"
        - "*.py[cod]"
        - "__pycache__/"
        - "*.egg-info/"
        - "*~"
        - "*.sw[a-p]"

Using the cache

In order to use the cache, you'll most likely want to replace a bind mount in your Docker Compose file with the named volume that you're using as a cache. This transition might look something like the following:

services:
  myapp:
    ...
    volumes:
      - .:/path/in/container

to

volumes:
  mycache:
    external: true

services:
  myapp:
    ...
    volumes:
      - mycache:/path/in/container

You can also use the cache in manually created containers via the -v flag in docker [container] create and docker [container] run.

Tearing it all down

Once you decide that you're done using a particular cache (which could be after an hour, a day, a year, etc.), you can tear it all down with the following:

mutagen sync terminate mycache
docker container stop mycachecontainer
docker container rm mycachecontainer
docker volume rm mycache
driq commented 4 years ago

Much appreciated!

Just to add: For those using docker-compose, if you follow the instructions at https://mutagen.io/documentation/orchestration/compose many of the above-mentioned steps will be done automatically for you.

Docker Compose support has not landed in a stable version of Mutagen yet, so you will have to install it from the beta channel:

brew install mutagen-io/mutagen/mutagen-beta

Once set up, this comes quite close to the setup I used to have with delegated volumes in Docker Edge 2.3.7.0.

driq commented 4 years ago

One gotcha: make sure that you don't use bind mounts for volumes that are cached with Mutagen, as this will result in both Mutagen and the native Docker for Mac implementation getting involved, resulting in truly bad performance.

I had such bind mounts configured when migrating from Docker Sync, which does not use the volumes configured in your docker-compose.yml file.

ryansch commented 4 years ago

I've gone so far as to remove the /Users and /Volumes mounts from Docker Desktop.

Edit: This has the nice side effect of docker itself spitting out a useful error message when I try to use a bind mount.

ryansch commented 4 years ago

@havoc-io I just finished onboarding our dev team last week with our new setup. They've already been using a wrapper I originally adopted from IFTTT so it was straightforward to teach the wrapper some new tricks.

The version we're using currently is here: https://github.com/outstand/dash/blob/master/bin/dev

I found it really helped to remove mutagen's sync volumes entirely when a developer runs the equivalent of mutagen compose down. That behaviour is here: https://github.com/outstand/dash/blob/e9cc77e949608d12b2e6aa57dbd76e5d705bffd8/bin/dev#L140

dzhgenti commented 4 years ago

Hey guys,

I'm trying to set up mutagen sync for local dev environment and getting an error when trying to set the default file/directory permissions. Here is how I try to create a sync:

mutagen sync create \
        source/sites \
        docker://root@drupal/var/www/sites \
        --name=my_sync \
        -i isource/sites/all/themes/custom/slicing/node_modules \
        --ignore-vcs \
        --default-file-mode-beta 0750 \
        --default-directory-mode-beta 0770 \
        --default-group-beta "www-data"

I'm getting this error:

Error: invalid default file mode for beta: executability bits detected in file mode

There is a named volume mounted to var/www/.

Inside the container, nginx is running under www-data and I need it to be able to execute files and write to certain directories.

Any ideas on how to resolve it?

Thanks!

iperelekhov commented 4 years ago

Hey guys,

I'm trying to set up mutagen sync for local dev environment and getting an error when trying to set the default file/directory permissions. Here is how I try to create a sync:

mutagen sync create \
        source/sites \
        docker://root@drupal/var/www/sites \
        --name=my_sync \
        -i isource/sites/all/themes/custom/slicing/node_modules \
        --ignore-vcs \
        --default-file-mode-beta 0750 \
        --default-directory-mode-beta 0770 \
        --default-group-beta "www-data"

I'm getting this error:

Error: invalid default file mode for beta: executability bits detected in file mode

There is a named volume mounted to var/www/.

Inside the container, nginx is running under www-data and I need it to be able to execute files and write to certain directories.

Any ideas on how to resolve it?

Thanks!

Looks like you are trying to build a LAMP stack dev env. I'm using configuration below for mine dev env (mutagen.yml):

  defaults:
    permissions:
      defaultOwner: "id:501"
      defaultGroup: "id:20"
    symlink:
      mode: "posix-raw"
    mode: "two-way-resolved"
    ignore:
      vcs: true
      paths:
        - "log"
        - "generated"
        - "*~"
        - "*.sw[a-p]"
        - "node_modules/"

501 is my user ID (host system) and 20 - group ID (also at host system). After sync starts, wait until mutagen hits Waiting for changes status and then execute docker exec mycachecontainer chmod -R +x <absolute path to project>

dzhgenti commented 4 years ago

thanks, @obeygaint! Might be a workaround... But we spin up and destroy containers quite often, so I really hoped there is a way to set permissions in the config. Do you know if it's possible to find out if sync finished from within a container? Perhaps something similar to mounting docker.sock to a container to access Docker API. I would then write a bash script to change permissions after the sync is finished.

Oh, btw, what about new files created on the host? Do I need to re-run chmod on them too?

xenoscopic commented 4 years ago

@dzhgenti The Error: invalid default file mode for beta: executability bits detected in file mode error is because Mutagen wants to control executability bits (i.e. 0111) for files (but not directories) that it synchronizes and thus requires that only read/write bits are specified in file permission configuration. The idea is to implement behavior similar to that of Git and rsync. The error message was designed to avoid confusion; I had hoped it would be better than just implicitly ignoring the 0111 bits of permission settings, but I think maybe it's actually just more confusing. I'll try to improve it. Some more background information can be found here. The solution just to exclude these bits from the specified permissions (i.e. use 0640 instead of 0750, though keep 0770 for directories).

Other than that, the configuration you're using looks good. I might recommend adding --default-owner-beta=www-data as well (if appropriate).

Oh, btw, what about new files created on the host? Do I need to re-run chmod on them too?

New files created on the host will be propagated to the container using the permissions you're specifying when you create the session.

Do you know if it's possible to find out if sync finished from within a container?

There isn't currently. The closest alternative is to use Mutagen's Compose support, which will ensure that the sync is finished before other services start. This would also allow you to codify permissions as part of your Compose configuration.

I'd be happy to follow up by email and/or video chat (email in GitHub profile) if you'd like to have a more direct conversation about what you're trying to accomplish.

dzhgenti commented 4 years ago

@havoc-io, thanks so much for your prompt and in-depth reply!

LionsAd commented 4 years ago

Here is an idea (pretty rough though) for avoiding the full-sync without having to ignore everything:

Overall there are always two problems when writing remote network file systems:

The idea is to avoid FUSE calls whenever possible, by using three layers:

So a mutagen-on-demand would be really nice as you could just mount your home directory - once a file was synced once it will continue to be monitored indefinitely.

"Cache" would obviously grow over time, but that is to be expected and one could just destroy the mutagen volume and start over. (assuming docker correctly frees space of deleted volumes)

xenoscopic commented 3 years ago

Hey @LionsAd, that's sort of an interesting middle ground between osxfs/gRPC-FUSE and Mutagen. I'm not sure that the Mutagen file index (as it exists now) would be sufficient to return adequate stat() results (and it's not structurally suited to that kind of access), but one could certainly imagine that caching file metadata en masse (with a more purpose-built data structure) would accelerate certain access patterns (e.g. those that stat() but don't open(), such as Git).

@leehambley was looking at building a framework to understand what the access patterns for various tools look like and whether or not a FUSE-based filesystem could be optimized accordingly. In practice, I think part of the problem is that a lot of code (e.g. build tools) out there use interleaved stat()/open()/read()/close() system calls that sustain huge round-trip overheads to the host (regardless of any metadata caching) and it's that latency that builds up to tens or hundreds of seconds of delay for certain operations.

I do think there might be a place for some sort of predictive model that could be used heuristically to pipeline certain options in on-demand virtual filesystems (e.g. if the virtual filesystem sees 2-3 stat()/open() pairs in the same directory after returning a getdents() result, it could start pipelining open() operations for the remaining files in the directory), but one would probably need a lot of data to build such a model, and I don't think there's any guarantee that a generalized heuristic could be formulated.

leehambley commented 3 years ago

I think part of the problem is that a lot of code (e.g. build tools) out there use interleaved stat()/open()/read()/close() system calls that sustain huge round-trip overheads to the host (regardless of any metadata caching) and it's that latency that builds up to tens or hundreds of seconds of delay for certain operations.

Indeed, naïve benchmarks show that most syscalls are under 5µ seconds on Linux Docker (it's native, native native), on Docker4Mac (osxfuse, so not the edge build, which I haven't tested yet) it's more in the order of 130ms. open and read were especially expensive, but even stat/lstat/access were 80 µsec or more, when on native, those "fast check" are more like 1..3µ sec.

In other words, osxfuse (and probably the grpcfuse, but I haven't checked it yet) is something like 25× slower for the fixed costs of accessing the filesystem.

yakobe commented 3 years ago

So... big sur brings a new urgency for our team as they enthusiastically update and discover the version of d4m with mutagen does not work 😬.

I notice there are currently 2 solutions:

What is the recommended way to get a simple setup like the integrated docker version. Should we use mutagen compose since we use docker compose? Or is it more stable to stitch together some scripts and use mutagen directly?

Thanks for all this work by the way 👍. The d4m performance and "Mac only syntaxes" is causing major headaches. 😬

xenoscopic commented 3 years ago

@yakobe I think the answer depends on how much tooling you have built on top of Docker Compose, the size of your code base, and which Compose commands you're using.

If you're using the docker-compose command directly, then switching to Mutagen Compose should make the transition fairly trivial, whereas if you have scripts or tools that are invoking docker-compose for you, then it might be harder to plug in mutagen compose as a substitute (though I do know teams that have done this successfully).

Additionally, if your code base is so large that the initial synchronization cycle takes several minutes, then using a static external volume with a manually created Mutagen synchronization session might save you time as opposed to regularly bringing up/tearing down your Compose-based project (which incurs a full initial synchronization cycle since the Compose-managed volumes are recreated).

Finally, the run command isn't supported by Mutagen Compose yet, so that might motivate the choice of manual Mutagen usage if you need that.

So I guess the short answer would be: I'd opt for Mutagen Compose if possible, but fall back to manual usage (possibly automated via a script) if Mutagen Compose doesn't work for your case just yet. Also, if there is a blocker for your case, please feel free to send feedback!

I'd be happy to consult over a quick chat if that would be helpful; feel free to email me (email in profile).

yakobe commented 3 years ago

Thanks @havoc-io. That's exactly the info i needed. 👍

I have just got it working with compose and there didn't seem to be any issues. Although the permissions are a bit confusing to configure. In the end did a chown on the app directory and things started to work.

Our app is pretty big and takes a while to sync. We used to do docker-compose up and docker-compose down often, but this will be very slow now. Maybe it's fine to switch our mindset to docker-compose start and docker-compose stop etc? This should avoid most of the sync initialization delays. Or would you recommend a manual config or using the mutagen "projects"?

xenoscopic commented 3 years ago

@yakobe I'm glad to hear that!

Sorry that the permissions were a bit confusing. The documentation around Mutagen's permission model is still at sort of a "first pass" level. Do you think that improving the documentation would help or were there technical limitations that prevented you from using this model?

Maybe it's fine to switch our mindset to docker-compose start and docker-compose stop etc?

That's certainly one way to approach the issue. To be honest, there aren't really best practices established here yet, but that's a good initial idea. Right now there are a few overhead costs with Mutagen Compose's up command that could be reduced, namely:

So in your case, I think that using either the start/stop model or trying to optimize ignores is probably the best route. For picking ignores, using du is usually the best route, though there's always some low-hanging fruit with VCS directories (which should almost always be ignored), package install directories such as node_modules (which aren't portable anyway), and VM cache directories/files (which also generally aren't portable).

yakobe commented 3 years ago

Unfortunately i cant seem to get the permissions working after all. If i chown manually after creation then it works, but sometimes files have permission problems. Maybe if they were created locally or by php, i'm not sure yet. Also, i would like to have it "just work" for the team without a manual step, but it didn't seem to work when i put it in te dockerfle.

Maybe the docs could explain with a little detail what is happening with the permissions and what the outcome of the various configurations mean. For people who dont have that much contact with such things it can be a bit overwhelming.

About the ignores: ignoring node-modules or php vendors makes it lots faster, but wouldn't this mean that they are not sync'd back to local? How would that work for an IDE like phpstorm that needs to read them?

Maybe some more examples in the github repo would help mutagen noobs like me? Eg, another one for a symfony project with encore, one for laravel mix etc. The community could help (eg once we get them working we can offer skeleton configurations to you)

Thanks again @havoc-io 🙂.

pjv commented 3 years ago

@yakobe I’m not sure exactly what kind of permissions issues you are seeing, but I have found it necessary when using mutagen compose in some instances to manually shell into the mutagen sidecar container and cd into its /volumes directory and chown one or more mounted volumes. There is a discussion about this between me and @leehambley here. I’m not sure where things stand with making a behavior like that automatic / configurable in the mutagen sidecar container.

yakobe commented 3 years ago

@pjv thanks for the info. I've been a bit quiet while i mess around and try things out. It seems that a setfacl on container start like this seems to be ok for me. Although i need to destroy and recreate my setup a few times more to be sure. Docker seems to be really wonky at the moment. Sometimes all is fine and dandy and then it suddenly decides to consume all my CPU and set my laptop ablaze 😂.

Does anyone have any notes about node-modules or php vendors question above. I see people adding them to the ignore, which would obviously speed the sync up considerably. But what about IDE's that need access to that code?

my2ter commented 3 years ago

Hi there,

Apologies if that's not the best place to drop this but I found Mutagen doc overwhelming and I came up with my own solution for running it with Docker for Mac in 3 simple steps. I thought I would drop my setup here just in case.

OS: macOS Big Sur Docker: 3.5.2 (66501) Mutagen: 0.11.7

Now from within your project you need 3 files:

#!/bin/bash

if [ "$1" = "up" ]; then
    mutagen sync terminate myproject
    docker-compose up -d
    mutagen sync create --name=myproject $(pwd) docker://root@container-name/path/to/project -c mutagen.yml
    mutagen sync monitor
elif [ "$1" = "down" ]; then
    docker-compose down --remove-orphans
    mutagen sync terminate myproject
fi
version: '3'
services:
  container-name:
    tty: true
    image: your-image
    container_name: container-name
    expose:
      - "80"

From there in the terminal do: ./docker-compose up

From my own tests, this is the fastest setup for file sync and keeping it super simple for all my projects to run it with Docker. Hope it helps someone else, and if anyone sees ways to upgrade it please let me know.

Cheers

akalineskou commented 2 years ago

I took some inspiration from docker-sync (which I was using before), you use a different docker-compose file instead of changing the one you already have (not everyone might be using macs, so this would only apply for docker for mac for example)

You have your docker-compose.yml (I wont post mine, since there are no changes there) Then create the new docker-compose-osx.yml

## Install
# brew install mutagen-io/mutagen/mutagen-beta mutagen-io/mutagen/mutagen-compose-beta
## Run
# mutagen compose -f docker-compose.yml -f docker-compose-osx.yml up app

version: "3.7"

services:
  app:
    volumes:
      - code:/path/to/code

volumes:
  code:

x-mutagen:
  sync:
    defaults:
      ignore:
        vcs: true
        paths:
          - .idea
    code:
      alpha: .
      beta: volume://code
      mode: two-way-resolved
      permissions:
        defaultDirectoryMode: 755
        defaultFileMode: 644

I've created two functions that check if an osx docker compose file exists, and uses mutagen compose up/down or docker compose up/down respectively .bash_aliases

function docker_compose_up() {
    if [ -f "docker-compose-osx.yml" ]; then
        mutagen compose -f docker-compose.yml -f docker-compose-osx.yml up ${@:1}
    else
        docker-compose up ${@:1}
    fi
}
function docker_compose_down() {
    if [ -f "docker-compose-osx.yml" ]; then
        mutagen compose down
    else
        docker-compose down
    fi
}

Then run docker_compose_up or docker_compose_down and you are ready to go. Pretty simple.

xenoscopic commented 2 years ago

I think it's time to close out this discussion. Thanks to everyone for your input!

Just to summarize the current state of affairs:

  1. Docker Desktop has added (experimental) support for virtiofs. So far the gains are very promising. You can find more information on the performance of virtiofs and how to use it in docker/roadmap#7 and docker/for-mac#1592. @stephen-turner also recently posted a Docker blog entry summarizing the situation, so check it out!
  2. If you still want to synchronize code into Docker volumes to use a native filesystem and/or work with remote Docker Engines, then you can still use Mutagen to do that. This will yield better performance at the cost of additional configuration, setup, and tooling. The easiest way to do this is using the new Compose-V2-based Mutagen Compose, but you can also still use any of the manual techniques outlined above.
xenoscopic commented 2 years ago

Hey all, just another update on this discussion: the release of the Docker Desktop extension API earlier this year has made it possible to create a Mutagen Docker Desktop extension that offers the same automatic bind-mount replacement that previously shipped in Docker Desktop. For a large number of people, this may be a better option than the Mutagen Compose integration or custom scripting.

This new extension also comes with the added benefit of new fanotify-based Linux filesystem watching (which results in much lower synchronization latency and 0% idle CPU usage), as well as the more recent advances in automatic conflict and problem handling in Mutagen.

This extension is still early in development, but I think it is ready to share with some Docker performance aficionados. I'm going to re-open this issue for a few days just to get some initial thoughts, but there's a dedicated issue tracker for the extension here if you run into specific problems.

You can find the documentation and installation instructions here: https://mutagen.io/documentation/docker-desktop-extension

There are two minor limitations that I'm hoping to fix in time as the extension SDK evolves.

At the moment, the goal is just to get to behavioral parity with bind mounts (with better performance), ideally with no edge cases (so, please... throw anything you'd like at it). In the near-term, I'd like to extend the functionality to add custom ownership/permissions, ad hoc caches, and remote Docker engine support (similar to what Mutagen Compose can do now). If there's something you'd like to see, please let me know!

For complete transparency: this extension is not going to be open-source and will require a license. Nothing is 100% finalized with respect to the exact pricing or licensing model, but more information will be available on that in the coming weeks. The goal with the extension is to provide a more sustainable revenue stream for Mutagen by offering a turnkey solution for those who don't want to delve into the docs. Mutagen and Mutagen Compose will still live on, of course, as open-source, with lots of features still coming down the pike.

cweagans commented 2 years ago

Nothing is 100% finalized with respect to the exact pricing or licensing model, but more information will be available on that in the coming weeks.

Do you have a general idea of what pricing might look like? Even just knowing your preliminary thoughts on how many zeroes will be on the number (let's say for an annual price) would be helpful :)

xenoscopic commented 2 years ago

@cweagans Without locking myself into too many specifics just now (because some of it will be dependent on reception, features, and perhaps some promotions), I'm targeting a price that will be comparable to Docker subscriptions. There is truthfully no fixed number at this point.

The goal is not to price gouge users, especially those (such as yourself :-)) whose feedback has helped guide development over the years - it's simply to build a sustainable entity for developing Mutagen as an open-source project, ideally by providing an increase in value to Docker Desktop comparable to the value that Docker Desktop provides over DIY solutions.

cweagans commented 2 years ago

Totally understood - that helps a lot. Really appreciate the extra bit of info there!

I hope this turns into a sustainable funding source for you. Mutagen is such a critical part of my workflow -- it's a no brainer to pay for it! :)

xenoscopic commented 2 years ago

Hey everyone, I've shipped a newer version of the extension (0.16.0-8) that fixes most of the issues in the first beta release. Thanks to all of those who provided feedback!

This new version also adds the ability to control the ownership of files in the VM, which may be of interest to some.

@cweagans Following up re: pricing: After talking to a few folks, I've decided to keep a free usage tier in the extension, probably restricted to a single cache or certain features, basically enough to work on a single project. For full/unlimited functionality, I'm planning on an introductory price of $7/month once the beta period expires.