hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
161 stars 19 forks source link

Any way to pfp a directory to one of a different name? #13

Open novosirj opened 6 years ago

novosirj commented 6 years ago

I've seen online that you can do something like rsyncing host:/snapshots/home.20180709/ to host2:/backups/home (eg. having /whatever/directory become the root tree at /whatever/directoryname, not at /whatever/directoryname/directory. In regular rsync, the key appears to be that trailing slash on the source directory.

With parsyncfp, this seems not to be possible, I guess because of the way it creates filename lists like "home.20180709/file/directory/file.txt" -- it always creates the home.20180709 directory as a result.

Is there anything that can be done without modifying the software? I guess if one had to modify it, it would just involve changing the PATH around slightly, but I'm not sure what else that might affect.

An easy workaround would be doing ln -s /snapshots/home.20180709 /snapshots/home, but this is a read only FS, so I'd have to do it someplace else, etc. and I'd prefer not to.

Thanks either way, and for any suggestions.

hjmangalam commented 6 years ago

On Monday, July 9, 2018 8:34:23 PM PDT novosirj wrote:

I've seen online that you can do something like rsyncing host:/snapshots/home.20180709/ to host2:/backups/home (eg. having /whatever/directory become the root tree at /whatever/directoryname, not at /whatever/directoryname/directory. In regular rsync, the key appears to be that trailing slash on the source directory.

Yes, I've been hosed so many times by rsync's trailing '/' feature/bug that I decided to NOT mirror its use in pfp to prevent others from accidentally re-copying several TBs into an unexpected directory. It's generally not catastrophic, but it can be time-consuming to back out of.

That feature/bug should not be difficult to implement, but I basically don't like it bc it behaves counter to what most ppl expect. Let me look at the code and I'll see if it messes up anything else.

If you're doing filesystem to filesystem copies (not bouncing from server to server), fpsync is a good alternative (and is part of the 'fpart' distribution. It's generally very good, but be careful - some versions of it have done some unexpected overwrites.
Definitely test it before you commit to a big transfer (as you should with any software, especially pfp ;)

I'll try to get back to you today or tomorrow with an answer or patch. hjm

With parsyncfp, this seems not to be possible, I guess because of the way it creates filename lists like "home.20180709/file/directory/file.txt" -- it always creates the home.20180709 directory as a result.

Is there anything that can be done without modifying the software? I guess if one had to modify it, it would just involve changing the PATH around slightly, but I'm not sure what else that might affect.

An easy workaround would be doing ln -s /snapshots/home.20180709 /snapshots/home, but this is a read only FS, so I'd have to do it someplace else, etc. and I'd prefer not to.

Thanks either way, and for any suggestions.

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

hjmangalam commented 6 years ago

On Tuesday, July 10, 2018 9:51:46 AM PDT harry mangalam wrote:

If you're doing filesystem to filesystem copies (not bouncing from server to server), fpsync is a good alternative (and is part of the 'fpart' distribution. It's generally very good, but be careful - some versions of it have done some unexpected overwrites. Definitely test it before you commit to a big transfer (as you should with any software, especially pfp ;)

I should have checked before I wrote that; the current (as of today) fpsync works great and has picked up some new features, tho (like parsyncfp), the overhead of setting up the parallel rsync will make both fpsync and parsyncfp /slower/ than rsync by itself itself on syncs that are largely identical and networkologically close.

Still looking at the original problem.

hjm

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

novosirj commented 6 years ago

It does speed things up on our system, I think in part because a lot of time is wasted slowly rsyncing small files, and pfp allows the rsync to continue around the slow parts. Thanks for the idea fpsync to look at as well.

We use GPFS for the FS here and I was considering looking into how to use mmfind to generate a file list that could be parceled out to rsyncs. I'm not sure how easy that would be to do using pfp, or whether doing something like that would almost entirely be replacing pfp.

hjmangalam commented 6 years ago

On Thursday, July 19, 2018 7:30:45 PM PDT novosirj wrote:

It does speed things up on our system, I think in part because a lot of time is wasted slowly rsyncing small files, and pfp allows the rsync to continue around the slow parts. Thanks for the idea fpsync to look at as well.

We use GPFS for the FS here and I was considering looking into how to use mmfind to generate a file list that could be parceled out to rsyncs. I'm not sure how easy that would be to do using pfp, or whether doing something like that would almost entirely be replacing pfp.

This is what fpart does in both parsyncfp and fpsync. It's pretty fast (faster and more convenient than the otherwise very nice kdirstat_cache_writer I was using previously).
I've not used mmfind; unless it uses an existing db of files to pull out info, I'd be surprised if it was faster than fpart - I spent a lot of time looking for such utilities and couldn't find any. Not to say they don't exist, but fpart is a VERY nicely designed and easy-to-use utility. hjm

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

novosirj commented 6 years ago

Yes, mmfind is part of GPFS and reads the GPFS metadata directly in parallel. It can pull a list of my files in a 484T filesystem with something like 160M files in about 25 mins. I’m not absolutely certain that that is faster, but if you are able to recommend me a fair test for just generating the file list, I would give it a shot.

hjmangalam commented 6 years ago

On Friday, July 20, 2018 1:32:15 PM PDT novosirj wrote:

Yes, mmfind is part of GPFS and rads the GPFS metadata directly in parallel. It can pull a list of my files in a 484T filesystem with something like 160M files in about 25 mins. I’m not absolutely certain that that is faster, but if you are able to point me to a fair test, I would give it a shot.

I would say that's faster than fpart since mmfind has access to the metadata directly, a big advantage. But you'll have to write a parallel rsync yourself since I don't use gpfs and don't plan on it. ;)

But with gnu parallel, it should be fairly easy if you can balance and package the file lists with mmfind.

hjm

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html