hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
161 stars 19 forks source link

rsync subprocess exits immediately and the file count number not match #36

Open idlej opened 4 years ago

idlej commented 4 years ago

Hi all,

I create a parsyncfp job to migrate about 4T files with a 4G chunksize.

perl /opt/parsyncfp/parsyncfp --NP=10 --altcache=/root/.parsyncfp/jiaoyu_gv_upload --startdir=/mnt/ps15 jiaoyu_gv_upload /mnt/hw_9000 --maxload=32 --verbose=3 --chunksize=4G --interface=eth4

The rsync process starts and exits immediately after the corresponding cachefile is created. The fpart log tells the cachefile a huge size

[root@ostor-zzj fpcache]# cat fpart.log.09.34.13_2020-08-11
Examining filesystem...
Filled part #0: size = 4294972562, 388939 file(s)
Filled part #1: size = 4294970326, 528222 file(s)
Filled part #2: size = 4294971458, 533180 file(s)
Filled part #3: size = 4294977049, 481518 file(s)
Filled part #4: size = 4294967610, 479615 file(s)
Filled part #5: size = 4294972108, 491070 file(s)
Filled part #6: size = 4294967691, 448095 file(s)

But actually there are only few lines in the cachefiles.

[root@ostor-zzj fpcache]# wc -l f*
   246 f.0
  1982 f.1
  1033 f.2
  1712 f.3
  1331 f.4
   614 f.5
   636 f.6
  2108 f.7
     8 fpart.log.09.34.13_2020-08-11
  9670 total

There are only small size of files be sent.

[root@ostor-zzj jiaoyu_gv_upload]# cat rsync-logfile-09.34.13_2020-08-11_*
2020/08/11 09:34:19 [8718] building file list
2020/08/11 09:34:19 [8718] sent 14196 bytes  received 17 bytes  total size 3123551
2020/08/11 09:43:48 [11469] building file list
2020/08/11 09:43:52 [11469] sent 50222 bytes  received 17 bytes  total size 15715093
2020/08/11 09:49:04 [13024] building file list
2020/08/11 09:49:09 [13024] sent 31495 bytes  received 17 bytes  total size 8388468
2020/08/11 09:54:41 [14649] building file list
2020/08/11 09:54:45 [14649] sent 44404 bytes  received 17 bytes  total size 13828482
2020/08/11 09:59:28 [16039] building file list
2020/08/11 09:59:32 [16039] sent 35528 bytes  received 17 bytes  total size 12950497
2020/08/11 10:04:12 [17434] building file list
2020/08/11 10:04:13 [17434] sent 16247 bytes  received 17 bytes  total size 5056321
2020/08/11 10:08:45 [18741] building file list
2020/08/11 10:08:47 [18741] sent 17912 bytes  received 17 bytes  total size 5144628
2020/08/11 10:12:27 [19802] building file list
2020/08/11 10:12:32 [19802] sent 58061 bytes  received 17 bytes  total size 23893062
2020/08/11 10:15:15 [20632] building file list
2020/08/11 10:15:17 [20632] sent 2398 bytes  received 17 bytes  total size 831982

I think the problem may be that the rsync starts when the corresponding cachefile just created and the rsync won't wait for the cachefile generating complete.

I don't know if this is the expected scenario and how to fix this.

hjmangalam commented 4 years ago

The behavior you describe at the end of your note is the expected behavior. It's a feature, not a bug of 'fpart', so that the rsyncing can start before the recursive descent of the entire tree is finished. ie: rsync is fired off with the completed chunk file as soon as it is done. I seem to remember the behavior you describe tho from an older version of parsyncfp. What version are you running?

Also, if you are using the '--altcache' option, make sure that the logs, etc you're reading are from the correct cache dir. It can get confusing once you start generating lots of cache files. However, assuming that you're correct and the logs are what you say, the small sizes of bytes sent could be the result of files already existing at the target, and only the checksums being exchanged. That is what makes rsync so efficient.

However, the output of the fpart log and the sizes of the chunk files should be proportional - In a recent test on my system, if the fpart log says:

 1 Examining filesystem...
 2 Filled part #0: size = 4587626807, 24151 file(s)
 3 Filled part #1: size = 4497063936, 2 file(s)
 4 Filled part #2: size = 4344097561, 2255 file(s)
 5 Filled part #3: size = 4312630307, 139 file(s)
 6 Filled part #4: size = 4769913757, 202 file(s)
 7 Filled part #5: size = 4295052994, 838 file(s)
 8 Filled part #6: size = 4759503831, 86444 file(s)
 9 Filled part #7: size = 4341861400, 31980 file(s)
10 Filled part #8: size = 2771762905, 1476 file(s)

the wordount of the f. chunk files should be very close (sometimes they differ due to permissions or spaces) $ wc -l f. 24151 f.0 2 f.1 2256 f.2 139 f.3 202 f.4 838 f.5 86444 f.6 31980 f.7 1476 f.8

Harry

On Mon, Aug 10, 2020 at 7:55 PM idlej notifications@github.com wrote:

Hi all,

I create a parsyncfp job to migrate about 4T files with a 4G chunksize.

perl /opt/parsyncfp/parsyncfp --NP=10 --altcache=/root/.parsyncfp/jiaoyu_gv_upload --startdir=/mnt/ps15 jiaoyu_gv_upload /mnt/hw_9000 --maxload=32 --verbose=3 --chunksize=4G --interface=eth4

The rsync process starts and exits immediately after the corresponding cachefile is created. The fpart log tells the cachefile a huge size

[root@ostor-zzj fpcache]# cat fpart.log.09.34.13_2020-08-11 Examining filesystem... Filled part #0: size = 4294972562, 388939 file(s) Filled part #1: size = 4294970326, 528222 file(s) Filled part #2: size = 4294971458, 533180 file(s) Filled part #3: size = 4294977049, 481518 file(s) Filled part #4: size = 4294967610, 479615 file(s) Filled part #5: size = 4294972108, 491070 file(s) Filled part #6: size = 4294967691, 448095 file(s)

But actually there are only few lines in the cachefiles.

[root@ostor-zzj fpcache]# wc -l f* 246 f.0 1982 f.1 1033 f.2 1712 f.3 1331 f.4 614 f.5 636 f.6 2108 f.7 8 fpart.log.09.34.13_2020-08-11 9670 total

There are only small size of files be sent.

[root@ostor-zzj jiaoyu_gv_upload]# cat rsync-logfile-09.34.132020-08-11* 2020/08/11 09:34:19 [8718] building file list 2020/08/11 09:34:19 [8718] sent 14196 bytes received 17 bytes total size 3123551 2020/08/11 09:43:48 [11469] building file list 2020/08/11 09:43:52 [11469] sent 50222 bytes received 17 bytes total size 15715093 2020/08/11 09:49:04 [13024] building file list 2020/08/11 09:49:09 [13024] sent 31495 bytes received 17 bytes total size 8388468 2020/08/11 09:54:41 [14649] building file list 2020/08/11 09:54:45 [14649] sent 44404 bytes received 17 bytes total size 13828482 2020/08/11 09:59:28 [16039] building file list 2020/08/11 09:59:32 [16039] sent 35528 bytes received 17 bytes total size 12950497 2020/08/11 10:04:12 [17434] building file list 2020/08/11 10:04:13 [17434] sent 16247 bytes received 17 bytes total size 5056321 2020/08/11 10:08:45 [18741] building file list 2020/08/11 10:08:47 [18741] sent 17912 bytes received 17 bytes total size 5144628 2020/08/11 10:12:27 [19802] building file list 2020/08/11 10:12:32 [19802] sent 58061 bytes received 17 bytes total size 23893062 2020/08/11 10:15:15 [20632] building file list 2020/08/11 10:15:17 [20632] sent 2398 bytes received 17 bytes total size 831982

I think the problem may be that the rsync starts when the corresponding cachefile just created and the rsync won't wait for the cachefile generating complete.

I don't know if this is the expected scenario and how to fix this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y3HZVGUSWQCZ3RJPF3SACXJPANCNFSM4P2SCZTQ .

--

Harry Mangalam

idlej commented 4 years ago

@hjmangalam

Hi Harry, I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp version 1.67.

I understand the strategy that rsync is fired off with the completed chunk file as soon as it is done. But in my case, the rsync is fired when the chunk just been created. And in the code the condition to fork a new rsnyc process is just the existance of the new chunkfile.

while ( $RSYNCS_GOING < $NP && $KEEPGOING ) {    #
  $CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI;       # the current fp chunkfile
  if ( -e $CUR_FP_FLE ) {                        # if the current chunkfile exists  <----------------------
    fixfilenames( $CUR_FP_FLE, $ROOTDIR );       # check & fix for spaces, bad chars.
         # entire rsync command and PID capture (used in total of 2 places)
    $logfile = $bytefiles . $CUR_FPI;
    $RSYNC_CMD =
"cd $TRIMPATH && rsync  --bwlimit=$MAXBW  $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE  $ROOTDIR  $TARGET  & echo \"\${!}\" >> $PIDFILE";
    if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); }
    # WARN("$RSYNC_CMD");
    # there will be as many logfiles as fp chunkfiles.
    # ie LOTS. but they can be deleted after the run has been verified..
    # TODO don't know if we need this logfile.
    if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); }
    system("$RSYNC_CMD");    # launch rsync and capture the bg job PID to PIDfile

I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. But in my case it seems the rsync just handle the few lines then exit.

And it's also quiet strange the fpart stopped to write to the chunkfile.

hjmangalam commented 4 years ago

Hi there - Sorry for the delay. I just exchanged an email with Ganael (fpart author). He reminded me that in live mode (-L) which I use, you can specify the -W option which I do not (yet) to execute the dependent cmd after the chunk file (his term is 'partition') is written instead of when the file exists.

I'll have to look into how to re-organize parsyncfp (pfp) to handle this - right now fpart is forked and so doesn't communicate with pfp. I'll have to have it write a marker file or something to indicate that the file is closed.

Busy today, so this might not get done immediately.

Stay tuned. Harry

On Mon, Aug 10, 2020 at 10:08 PM idlej notifications@github.com wrote:

@hjmangalam https://github.com/hjmangalam

Hi Harry, I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp version 1.67.

I understand the strategy that rsync is fired off with the completed chunk file as soon as it is done. But in my case, the rsync is fired when the chunk just been created. And in the code the condition to fork a new rsnyc process is just the existance of the new chunkfile.

while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #

$CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile

if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------

fixfilenames( $CUR_FP_FLE, $ROOTDIR );       # check & fix for spaces, bad chars.

     # entire rsync command and PID capture (used in total of 2 places)

$logfile = $bytefiles . $CUR_FPI;

$RSYNC_CMD =

"cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";

if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); }

# WARN("$RSYNC_CMD");

# there will be as many logfiles as fp chunkfiles.

# ie LOTS. but they can be deleted after the run has been verified..

# TODO don't know if we need this logfile.

if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); }

system("$RSYNC_CMD");    # launch rsync and capture the bg job PID to PIDfile

I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. But in my case it seems the rsync just handle the few lines then exit.

And it's also quiet strange the fpart stopped to write to the chunkfile.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/36#issuecomment-671727863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y44LKDKIDNTQDA55N3SADG3NANCNFSM4P2SCZTQ .

--

Harry Mangalam

hjmangalam commented 4 years ago

Hi there,

I've added the bits to use only closed chunk files but am going thru the code again with some more oversight to make sure I haven't missed anything. If you'd like to test the current version I'm working on, send me a note with your private email and I'll send it to you. Otherwise it should be pushed to github within a day or so.

Harry

On Wed, Aug 12, 2020 at 8:48 AM Harry Mangalam hjmangalam@gmail.com wrote:

Hi there - Sorry for the delay. I just exchanged an email with Ganael (fpart author). He reminded me that in live mode (-L) which I use, you can specify the -W option which I do not (yet) to execute the dependent cmd after the chunk file (his term is 'partition') is written instead of when the file exists.

I'll have to look into how to re-organize parsyncfp (pfp) to handle this - right now fpart is forked and so doesn't communicate with pfp. I'll have to have it write a marker file or something to indicate that the file is closed.

Busy today, so this might not get done immediately.

Stay tuned. Harry

On Mon, Aug 10, 2020 at 10:08 PM idlej notifications@github.com wrote:

@hjmangalam https://github.com/hjmangalam

Hi Harry, I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp version 1.67.

I understand the strategy that rsync is fired off with the completed chunk file as soon as it is done. But in my case, the rsync is fired when the chunk just been created. And in the code the condition to fork a new rsnyc process is just the existance of the new chunkfile.

while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #

$CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile

if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------

fixfilenames( $CUR_FP_FLE, $ROOTDIR );       # check & fix for spaces, bad chars.

     # entire rsync command and PID capture (used in total of 2 places)

$logfile = $bytefiles . $CUR_FPI;

$RSYNC_CMD =

"cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";

if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); }

# WARN("$RSYNC_CMD");

# there will be as many logfiles as fp chunkfiles.

# ie LOTS. but they can be deleted after the run has been verified..

# TODO don't know if we need this logfile.

if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); }

system("$RSYNC_CMD");    # launch rsync and capture the bg job PID to PIDfile

I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. But in my case it seems the rsync just handle the few lines then exit.

And it's also quiet strange the fpart stopped to write to the chunkfile.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/36#issuecomment-671727863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y44LKDKIDNTQDA55N3SADG3NANCNFSM4P2SCZTQ .

--

Harry Mangalam

--

Harry Mangalam