mutagen-io / mutagen

Fast file synchronization and network forwarding for remote development
https://mutagen.io
Other
3.41k stars 153 forks source link

Allow wildcard sync specification #221

Open tobyS opened 4 years ago

tobyS commented 4 years ago

We have certain files across our mono-repo setup which require special sync handling. It is easy to exclude these files from the standard sync using the ignore.paths specification. But adding the specialized sync for each individual file is a hassle. It would therefore be great if the specification of alpha would also allow wildcards.

Usage Example

   # Code is only modified locally and synced upstream
    up-backstage:
        alpha: "./saas/backstage"
        beta: "1.2.3.4:/var/www/frontastic/saas/backstage"
        mode: "one-way-replica"
        ignore:
            paths:
                # Two way sync
                - 'composer.*'

    backstage-composer-json:
        alpha: "1.2.3.4:/var/www/frontastic/saas/backstage/composer.json"
        beta: "./saas/backstage/composer.json"
        mode: "two-way-resolved"

    backstage-composer-lock:
        alpha: "1.2.3.4:/var/www/frontastic/saas/backstage/composer.lock"
        beta: "./saas/backstage/composer.lock"
        mode: "two-way-resolved"

Proposed Simplification

   # Code is only modified locally and synced upstream
    up-backstage:
        alpha: "./saas/backstage"
        beta: "1.2.3.4:/var/www/frontastic/saas/backstage"
        mode: "one-way-replica"
        ignore:
            paths:
                # Two way sync
                - 'composer.*'

    backstage-composer-json:
        alpha: "1.2.3.4:/var/www/frontastic/saas/backstage/composer.*"
        beta: "./saas/backstage/"
        mode: "two-way-resolved"

Driving this further, it could be very useful for mono-repo setups to have full-fledged wildcard support in alpha specification, like:

   # Code is only modified locally and synced upstream
    code-up:
        alpha: "./"
        beta: "1.2.3.4:/var/www/frontastic/"
        mode: "one-way-replica"
        ignore:
            paths:
                # Two way sync
                - 'composer.*'

    composer-down:
        alpha: "1.2.3.4:/var/www/frontastic/*/*/composer.*"
        beta: "./saas/backstage/"
        mode: "two-way-resolved"
xenoscopic commented 4 years ago

Thanks for the input. I think this is a reasonable request and something I've had to creatively work around myself. Selectively including/excluding content can become a tedious game of ignoring/unignoring content. I like your proposed design, though I'll probably need a few days to fully process it and understand how it would fit into Mutagen's current implementation and the range of issues it could address.

If I had to sketch a rough implementation, I think it would involve looking at alpha/beta URLs that contain wildcards, finding the longest path prefix of those URLs that doesn't contain a wildcard, and using that as the synchronization root. The wildcard-containing segment would then be used by the filesystem scanning to guide/limit its traversal.

The two immediate issues I foresee would be:

  1. It could potentially be a footgun for people setting up complex sets of synchronization sessions that might interfere with each other (since it's not as explicit as ignoring content), but the argument could be made that ignores can get very complex and are even more of a footgun.
  2. Wildcards in the URL alone might not be sufficient to cover all cases. Perhaps it would with a generous application of wildcards coupled with additional ignores, though that might be just as complicated as the current behavior.
  3. I'm not entirely sure how things would work if both URLs specified wildcards. I think this case would be disallowed.

Perhaps Mutagen sync sessions could offer an alternative "ignore-by-default" mode, coupled with an "include" list setting. I vaguely recall a discussion along these lines years ago, which I may have argued could be accomplished with a list like:

- "*"
- "!/something"
- "!/something/*
...

but with recursive filesystem traversal these things quickly become complicated.

I'd have to do some session configuration mock-ups to see how much would actually be saved by doing something like an include list. It's clear that having a better way to specify these setups is necessary, though it's unclear to me whether it would be best accomplished through an include list, extended ignore syntax (without straying too far from .gitignore), a wildcard shorthand in URLs, or simply a gallery of examples that show how to accomplish certain setups with minimal ignore settings.

Let me know if you have any additional thoughts. I'll continue to think about this as well.