hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
161 stars 19 forks source link

Can't deal with long file lists #26

Open YPatois opened 5 years ago

YPatois commented 5 years ago

Hi,

Here the error: sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long

First it takes a long time to rebuild the cache list, and then it fails on this. I have a quite large directory with several hundred of thousands of files at top level. Seems it doesn't works in tha case.

Regards,

Yannick

hjmangalam commented 5 years ago

Hi Yannick,

short version:
use a larger --chunkfile size

long version: This is a basic problem with filesystems (or human behavior, or the intersection thereof).

If you create zillions of files (particularly in single directories), and then try to list them via 'ls' or manipulate them in other ways that require an extended stat(), you'll be waiting a long time. Even on NVME drives doing stats on 30M files (ie, the results of a Trinity run) takes a while.

So that's the problem with the cache rebuild time. fpart is the best, fastest file chunker I've seen and even it can get overwhelmed if it meets with egregiously bad filesystem abuse. I'm winding up to write a longer rant about this for our cluster users...

I call this the ZOTfile problem (for Zillions of Tiny files; often they're tiny, sometimes there are Zillions of Huge files, but that's a slightly different problem as well as another acronym).

The 'Argument list too long' is a different problem, one that is addressable in other ways, but I've ignored it bc it usually alerts to the underlying problem that your chunkfile list is too big, which often leads to other problems. The version of pfp that is now in final checking will preempt this problem by complaining and then dying if you try to generate a bazillion chunk files. It will gently advise you to choose a larger chunkfile size so that you generate a smaller number of chunkfiles.

However, you make a good point, repeating several users complaining of similar behavior so one of the options that I'm thinking about is '--analyze' which will offer to do some primitive analysis of the dir trees and suggest approaches to optimize the process, possibly calling a tarchiving program to dramatically decrease the number of files to be transferred (bc obviously(?) rsyncing ZOTfiles is MUCH less efficient than rsyncing fewer large files.

hjm

On Wednesday, June 19, 2019 5:26:59 AM PDT YPatois wrote:

Hi,

Here the error: sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long sh: /usr/bin/ls: Argument list too long

First it takes a long time to rebuild the cache list, and then it fails on this. I have a quite large directory with several hundred of thousands of files at top level. Seems it doesn't works in tha case.

Regards,

Yannick

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html