jbd / msrsync

Multi-stream rsync wrapper
GNU General Public License v3.0
484 stars 75 forks source link

using rsync exclude #7

Closed cmarks-hivelocity closed 6 years ago

cmarks-hivelocity commented 6 years ago

It is possible to use excludes to avoid syncing certain sub folders as a part of the -r arguments?

jbd commented 6 years ago

the -r/--rsync are the rsync options that will be used by the spawned rsync processes:

rsync options:
    -r, --rsync ...       MUST be last option. rsync options as a quoted string ["-aS --numeric-ids"]. The "--from0 --files-from=... --quiet --verbose --stats --log-file=..." options will ALWAYS be added, no matter what. Be aware that this will affect
                            all rsync *from/filter files if you want to use them. See rsync(1) manpage for details.

Did you already try the exclude/include options from rsync ? Did it work incorrectly ?

            --exclude=PATTERN       exclude files matching PATTERN
            --exclude-from=FILE     read exclude patterns from FILE
            --include=PATTERN       don't exclude files matching PATTERN
            --include-from=FILE     read include patterns from FILE
cmarks-hivelocity commented 6 years ago

I'm doing a simple test: ./msrsync test1/ test2/ -r"--exclude=b"

but the folder b is still included and synced to the folder test2

jbd commented 6 years ago

This command works for me:

$ mkdir -p test1/b test2
$ ./msrsync -r "--exclude b --exclude b/**" test1/ test2/ 
$ find test2
test2

Have a look at this answer on stackoverflow: https://stackoverflow.com/a/41876294.

mailinglists35 commented 6 years ago

msrsync should provide it's own exclude option - OR - dynamically translate what user intended to exclude to a exclude that works with files-from

mailinglists35 commented 6 years ago

the point is what works in simple rsync invocation should also work in msrsync. i also tried with exclude something/** and still getting crash

jbd commented 6 years ago

I understand that it is not convenient as the but that's how exclude and files-from are working together. It is exactly the same behavior in simple rsync (as the stackoverflow link shows) : you have to exclude something AND something/**.

Regarding the crash you mentionned, I guess you are refering to your issue https://github.com/jbd/msrsync/issues/12 ? It should be corrected in https://github.com/jbd/msrsync/commit/e2368315eee08df6d55a86978b411e7b97798f90.

A msrsync exclude option is a good idea. It's not logical to walk part of a tree you'll exclude in the spawned rsync. But I'm afraid that as soon as I'll try to implement this, I will need to add regular expression support and what not. For the moment, I'll stick with the rsync exclude option.

agates commented 4 years ago

@jbd I decided to give a shot at implementing exclude at the msrsync level because we had a few crawls that were going over backups/snapshots that ultimately resulted in an inflated file list, much of which rsync would exclude anyway (but would take a long time to go over). So here's a gist of my os.walk wrapper.

My solution is more basic than rsync exclude, but it meets our needs and uses built in libraries (python 2.6 compatible), fnmatch and re. fnmatch utilizes Unix shell-style wildcards. The big caveat is exclusions are only per filename/dirname and does not care about the full path.

It uses fnmatch.translate to generate regular expressions and compiles two lists (one for files and one for directories) of regular expressions into a pattern, each called once per os.walk iteration.

This utilizes the built in functionality to modify the os.walk dirnames list in-place, which causes it to skip anything excluded. It probably doesn't need to modify the files list in place and could just return it, but I only just now thought about it :).

https://gist.github.com/agates/51db90f77ea1a8f658906a94f9161d4a

Lastly, thank you very much for this tool. We have used it to synchronize at least a petabyte.

jbd commented 4 years ago

Hello,

thank you for taking the time reporting and writing this. I understand the problem and the inconvience of the rsync exclude mechanisms.

Right now, I don't have time to integrate this inside the project butI've implemented something very naive some time ago in the "exclude" branch (Oct. 2018) :

https://github.com/jbd/msrsync/blob/exclude/msrsync#L512

It's very basic and not as flexible as your proposal but maybe it could help remove .snapshot directories and whatnot right now.

I'll try to have a look at this and other things in the future.

boelle commented 3 months ago

late and prob a dumb Q:

i run this

rsync -av --progress --delete --exclude 'snapraid.content' --exclude 'snapraid.content.lock' --exclude 'aquota.group' --exclude 'aquota.user' --exclude 'lost+found/' --exclude 'homedir/Malcolm/Borgbackup/' --exclude 'homedir/Bo/iTunes/' /srv/mergerfs/Data/ root@100.126.215.89:/srv/mergerfs/Data/homedir/bo/Borgbackup/Backup-Bo-OMV_NAP/

works fine but how would that look like with msrsync ?