hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
161 stars 19 forks source link

Can't handle a medium-sized sync reliably #14

Open mightbemyname opened 6 years ago

mightbemyname commented 6 years ago

Tried to sync approx 8M files under 1M directories. Got errors like "sh: 1: ls: Argument list too long" and eventually the script finished with "[17115] of [0]" and thought it had finished successfully. Only 3TB of 18TB had been copied.

The fpart log showed 93859 total parts.

mightbemyname commented 6 years ago

When rerunning with a folder subset, the script couldn't delete old fpart chunk files: "sh: 1: rm: Argument list too long" but didn't catch the error; the following line was "INFO: The fpart chunk files [/home/myuserdir/.parsyncfp/fpcache/f*] are cleared.. continuing"

hjmangalam commented 6 years ago

Can you send me the command line you used?

Was this a real FS or an artificial tree set up to test it? Regardless, it should behave better than that. But 1M dirs? Wow! hmmm, I can see that if the string describing the full path was extremely long, then the bash max string length would cut in.

That's a good bug to hit tho.

If it's not a security issue, could you send me the largest fpart cache file? It would be the top file listed as the result of this command:

ls -lSh ~/.parsyncfp/fpcache/f*

hjm

On Saturday, July 28, 2018 7:02:06 AM PDT Nanocephalic wrote:

Tried to sync approx 8M files under 1M directories. Got errors like "sh: 1: ls: Argument list too long" and eventually the script finished with "[17115] of [0]" and thought it had finished successfully. Only 3TB of 18TB had been copied.

The fpart log showed 93859 total parts.

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

mightbemyname commented 6 years ago

Can't send file names. I can say that the path lengths can become excessive - hundreds of bytes for sure.

Also, this is a real file system. I’m copying a small application-specific proxy from old hardware to new hardware.

hjmangalam commented 6 years ago

Hi (& love the name),

OK - it does give me the broad strokes of the problem tho - I can check when the path string gets too long and try to do something intelligent about - maybe recurse down the path until it becomes a reasonable length and then walk back up the tree doing the rsync incrementally.

Or try to use xargs to pass the input.

ANyway, it won't be done soon as I have a bunch of weekend stuff that has to be done, but I thank you for the bug report.

You should try the fpsync utility distributed with fpart - that functions similarly to parsyncfp, but the interface is less informative/chatty.

However, it may (..?) address the long path name problem.

Out of interest, what does this return on your system:

getconf ARG_MAX

On my ubuntu lappie, I get 2097152

On our CentOS 6.9 system, it's 7864320

That's set in the kernel, so unless you want to recompile, you can't change it.

hjm On Saturday, July 28, 2018 7:44:36 AM PDT Nanocephalic wrote:

Thanks for the reply! Sadly I can’t send anything with file names :(

I can say that the path lengths can become excessive - hundreds of bytes for sure.

Also, this is a real file system. I’m copying a small perforce proxy from old hardware to new hardware. A large copy in my environment would be 10-20x this size.

Sent from my iPhone

On Jul 28, 2018, at 10:13 AM, Harry Mangalam notifications@github.com wrote:

Can you send me the command line you used?

Was this a real FS or an artificial tree set up to test it? Regardless, it should behave better than that. But 1M dirs? Wow! hmmm, I can see that if the string describing the full path was extremely long, then the bash max string length would cut in.

That's a good bug to hit tho.

If it's not a security issue, could you send me the largest fpart cache file? It would be the top file listed as the result of this command:

ls -lSh ~/.parsyncfp/fpcache/f*

hjm

On Saturday, July 28, 2018 7:02:06 AM PDT Nanocephalic wrote:

Tried to sync approx 8M files under 1M directories. Got errors like "sh: 1: ls: Argument list too long" and eventually the script finished with

mightbemyname commented 6 years ago

I show the same value as your Ubuntu system. I worked around the problem by starting about 6 directories down, and running about a hundred copies in series. So far it’s working.

Sent from my iPhone

On Jul 28, 2018, at 6:16 PM, Harry Mangalam notifications@github.com wrote:

Hi (& love the name),

OK - it does give me the broad strokes of the problem tho - I can check when the path string gets too long and try to do something intelligent about - maybe recurse down the path until it becomes a reasonable length and then walk back up the tree doing the rsync incrementally.

Or try to use xargs to pass the input.

ANyway, it won't be done soon as I have a bunch of weekend stuff that has to be done, but I thank you for the bug report.

You should try the fpsync utility distributed with fpart - that functions similarly to parsyncfp, but the interface is less informative/chatty.

However, it may (..?) address the long path name problem.

Out of interest, what does this return on your system:

getconf ARG_MAX

On my ubuntu lappie, I get 2097152

On our CentOS 6.9 system, it's 7864320

That's set in the kernel, so unless you want to recompile, you can't change it.

hjm On Saturday, July 28, 2018 7:44:36 AM PDT Nanocephalic wrote:

Thanks for the reply! Sadly I can’t send anything with file names :(

I can say that the path lengths can become excessive - hundreds of bytes for sure.

Also, this is a real file system. I’m copying a small perforce proxy from old hardware to new hardware. A large copy in my environment would be 10-20x this size.

Sent from my iPhone

On Jul 28, 2018, at 10:13 AM, Harry Mangalam notifications@github.com wrote:

Can you send me the command line you used?

Was this a real FS or an artificial tree set up to test it? Regardless, it should behave better than that. But 1M dirs? Wow! hmmm, I can see that if the string describing the full path was extremely long, then the bash max string length would cut in.

That's a good bug to hit tho.

If it's not a security issue, could you send me the largest fpart cache file? It would be the top file listed as the result of this command:

ls -lSh ~/.parsyncfp/fpcache/f*

hjm

On Saturday, July 28, 2018 7:02:06 AM PDT Nanocephalic wrote:

Tried to sync approx 8M files under 1M directories. Got errors like "sh: 1: ls: Argument list too long" and eventually the script finished with — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

hjmangalam commented 6 years ago

I understand the need for not revealing the names of files, but can you give me the max length of the file paths stored in those files? Just to get an idea of what length some of your lines are?

cat f.* | perl -e 'while (<>){printf "%d\n",length();} ' | sort | gzip > pathlengths.gz

that should provide a semi-randomized and highly compressible output that gives me the info.

hjm

On Saturday, July 28, 2018 5:38:06 PM PDT Nanocephalic wrote:

I show the same value as your Ubuntu system. I worked around the problem by starting about 6 directories down, and running about a hundred copies in series. So far it’s working.

Sent from my iPhone

On Jul 28, 2018, at 6:16 PM, Harry Mangalam notifications@github.com wrote:

Hi (& love the name),

OK - it does give me the broad strokes of the problem tho - I can check when the path string gets too long and try to do something intelligent about - maybe recurse down the path until it becomes a reasonable length and then walk back up the tree doing the rsync incrementally.

Or try to use xargs to pass the input.

ANyway, it won't be done soon as I have a bunch of weekend stuff that has to be done, but I thank you for the bug report.

You should try the fpsync utility distributed with fpart - that functions similarly to parsyncfp, but the interface is less informative/chatty.

However, it may (..?) address the long path name problem.

Out of interest, what does this return on your system:

getconf ARG_MAX

On my ubuntu lappie, I get 2097152

On our CentOS 6.9 system, it's 7864320

That's set in the kernel, so unless you want to recompile, you can't change it.

hjm

On Saturday, July 28, 2018 7:44:36 AM PDT Nanocephalic wrote:

Thanks for the reply! Sadly I can’t send anything with file names :(

I can say that the path lengths can become excessive - hundreds of bytes for sure.

Also, this is a real file system. I’m copying a small perforce proxy from old hardware to new hardware. A large copy in my environment would be 10-20x this size.

Sent from my iPhone

FilipeMaia commented 5 years ago

Hi,

I think this might help explain the problem:

ls -lSh ~/.parsyncfp/fpcache/f*
bash: /usr/bin/ls: Argument list too long

there are too many fpart cache files.

hjmangalam commented 5 years ago

I may have noted this in a comment to another user - apologies if I didn't catch it on this channel.

Yes, this is a common problem if you set the chunksize too small - you'll generate a bazillion chunks. If this was the result with the default (10GB) chunksize, try increasing it to 50GB.

Enough ppl have reported this kind of problem that I'll add a check to the code to complain if the list goes over a certain size.

Thanks for noting that and reporting it.

Harry

On Wednesday, April 17, 2019 2:13:45 AM PDT Filipe Maia wrote:

Hi,

I think this might help explain the problem:

ls -lSh ~/.parsyncfp/fpcache/f*
bash: /usr/bin/ls: Argument list too long

there are too many fpart cache files.

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

jaytaylor commented 5 years ago

I've also run into this error while trying to use parsyncfp to copy 57 million files.

The command line was:

sudo parsyncfp \
    --maxload 1000 \
    --maxbw 9999999999999 \
    --np 50 \
    --verbose 3 \
    --rsyncopts -av \
    --nowait \
    Data/Files /data-fs-20190604/

And there are 20,431 files under /root/.parsyncfp/.

I'll try increasing the chunk size to see if it resolves the issue.