Open idlej opened 4 years ago
The behavior you describe at the end of your note is the expected behavior. It's a feature, not a bug of 'fpart', so that the rsyncing can start before the recursive descent of the entire tree is finished. ie: rsync is fired off with the completed chunk file as soon as it is done. I seem to remember the behavior you describe tho from an older version of parsyncfp. What version are you running?
Also, if you are using the '--altcache' option, make sure that the logs, etc you're reading are from the correct cache dir. It can get confusing once you start generating lots of cache files. However, assuming that you're correct and the logs are what you say, the small sizes of bytes sent could be the result of files already existing at the target, and only the checksums being exchanged. That is what makes rsync so efficient.
However, the output of the fpart log and the sizes of the chunk files should be proportional - In a recent test on my system, if the fpart log says:
1 Examining filesystem...
2 Filled part #0: size = 4587626807, 24151 file(s)
3 Filled part #1: size = 4497063936, 2 file(s)
4 Filled part #2: size = 4344097561, 2255 file(s)
5 Filled part #3: size = 4312630307, 139 file(s)
6 Filled part #4: size = 4769913757, 202 file(s)
7 Filled part #5: size = 4295052994, 838 file(s)
8 Filled part #6: size = 4759503831, 86444 file(s)
9 Filled part #7: size = 4341861400, 31980 file(s)
10 Filled part #8: size = 2771762905, 1476 file(s)
the wordount of the f. chunk files should be very close (sometimes they differ due to permissions or spaces) $ wc -l f. 24151 f.0 2 f.1 2256 f.2 139 f.3 202 f.4 838 f.5 86444 f.6 31980 f.7 1476 f.8
Harry
On Mon, Aug 10, 2020 at 7:55 PM idlej notifications@github.com wrote:
Hi all,
I create a parsyncfp job to migrate about 4T files with a 4G chunksize.
perl /opt/parsyncfp/parsyncfp --NP=10 --altcache=/root/.parsyncfp/jiaoyu_gv_upload --startdir=/mnt/ps15 jiaoyu_gv_upload /mnt/hw_9000 --maxload=32 --verbose=3 --chunksize=4G --interface=eth4
The rsync process starts and exits immediately after the corresponding cachefile is created. The fpart log tells the cachefile a huge size
[root@ostor-zzj fpcache]# cat fpart.log.09.34.13_2020-08-11 Examining filesystem... Filled part #0: size = 4294972562, 388939 file(s) Filled part #1: size = 4294970326, 528222 file(s) Filled part #2: size = 4294971458, 533180 file(s) Filled part #3: size = 4294977049, 481518 file(s) Filled part #4: size = 4294967610, 479615 file(s) Filled part #5: size = 4294972108, 491070 file(s) Filled part #6: size = 4294967691, 448095 file(s)
But actually there are only few lines in the cachefiles.
[root@ostor-zzj fpcache]# wc -l f* 246 f.0 1982 f.1 1033 f.2 1712 f.3 1331 f.4 614 f.5 636 f.6 2108 f.7 8 fpart.log.09.34.13_2020-08-11 9670 total
There are only small size of files be sent.
[root@ostor-zzj jiaoyu_gv_upload]# cat rsync-logfile-09.34.132020-08-11* 2020/08/11 09:34:19 [8718] building file list 2020/08/11 09:34:19 [8718] sent 14196 bytes received 17 bytes total size 3123551 2020/08/11 09:43:48 [11469] building file list 2020/08/11 09:43:52 [11469] sent 50222 bytes received 17 bytes total size 15715093 2020/08/11 09:49:04 [13024] building file list 2020/08/11 09:49:09 [13024] sent 31495 bytes received 17 bytes total size 8388468 2020/08/11 09:54:41 [14649] building file list 2020/08/11 09:54:45 [14649] sent 44404 bytes received 17 bytes total size 13828482 2020/08/11 09:59:28 [16039] building file list 2020/08/11 09:59:32 [16039] sent 35528 bytes received 17 bytes total size 12950497 2020/08/11 10:04:12 [17434] building file list 2020/08/11 10:04:13 [17434] sent 16247 bytes received 17 bytes total size 5056321 2020/08/11 10:08:45 [18741] building file list 2020/08/11 10:08:47 [18741] sent 17912 bytes received 17 bytes total size 5144628 2020/08/11 10:12:27 [19802] building file list 2020/08/11 10:12:32 [19802] sent 58061 bytes received 17 bytes total size 23893062 2020/08/11 10:15:15 [20632] building file list 2020/08/11 10:15:17 [20632] sent 2398 bytes received 17 bytes total size 831982
I think the problem may be that the rsync starts when the corresponding cachefile just created and the rsync won't wait for the cachefile generating complete.
I don't know if this is the expected scenario and how to fix this.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y3HZVGUSWQCZ3RJPF3SACXJPANCNFSM4P2SCZTQ .
--
Harry Mangalam
@hjmangalam
Hi Harry, I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp version 1.67.
I understand the strategy that rsync is fired off with the completed chunk file as soon as it is done.
But in my case, the rsync is fired when the chunk just been created. And in the code the condition to fork a new rsnyc process is just the existance of the new chunkfile.
while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #
$CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile
if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------
fixfilenames( $CUR_FP_FLE, $ROOTDIR ); # check & fix for spaces, bad chars.
# entire rsync command and PID capture (used in total of 2 places)
$logfile = $bytefiles . $CUR_FPI;
$RSYNC_CMD =
"cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";
if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); }
# WARN("$RSYNC_CMD");
# there will be as many logfiles as fp chunkfiles.
# ie LOTS. but they can be deleted after the run has been verified..
# TODO don't know if we need this logfile.
if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); }
system("$RSYNC_CMD"); # launch rsync and capture the bg job PID to PIDfile
I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. But in my case it seems the rsync just handle the few lines then exit.
And it's also quiet strange the fpart stopped to write to the chunkfile.
Hi there - Sorry for the delay. I just exchanged an email with Ganael (fpart author). He reminded me that in live mode (-L) which I use, you can specify the -W option which I do not (yet) to execute the dependent cmd after the chunk file (his term is 'partition') is written instead of when the file exists.
I'll have to look into how to re-organize parsyncfp (pfp) to handle this - right now fpart is forked and so doesn't communicate with pfp. I'll have to have it write a marker file or something to indicate that the file is closed.
Busy today, so this might not get done immediately.
Stay tuned. Harry
On Mon, Aug 10, 2020 at 10:08 PM idlej notifications@github.com wrote:
@hjmangalam https://github.com/hjmangalam
Hi Harry, I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp version 1.67.
I understand the strategy that rsync is fired off with the completed chunk file as soon as it is done. But in my case, the rsync is fired when the chunk just been created. And in the code the condition to fork a new rsnyc process is just the existance of the new chunkfile.
while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #
$CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile
if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------
fixfilenames( $CUR_FP_FLE, $ROOTDIR ); # check & fix for spaces, bad chars. # entire rsync command and PID capture (used in total of 2 places) $logfile = $bytefiles . $CUR_FPI; $RSYNC_CMD =
"cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";
if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); } # WARN("$RSYNC_CMD"); # there will be as many logfiles as fp chunkfiles. # ie LOTS. but they can be deleted after the run has been verified.. # TODO don't know if we need this logfile. if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); } system("$RSYNC_CMD"); # launch rsync and capture the bg job PID to PIDfile
I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. But in my case it seems the rsync just handle the few lines then exit.
And it's also quiet strange the fpart stopped to write to the chunkfile.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/36#issuecomment-671727863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y44LKDKIDNTQDA55N3SADG3NANCNFSM4P2SCZTQ .
--
Harry Mangalam
Hi there,
I've added the bits to use only closed chunk files but am going thru the code again with some more oversight to make sure I haven't missed anything. If you'd like to test the current version I'm working on, send me a note with your private email and I'll send it to you. Otherwise it should be pushed to github within a day or so.
Harry
On Wed, Aug 12, 2020 at 8:48 AM Harry Mangalam hjmangalam@gmail.com wrote:
Hi there - Sorry for the delay. I just exchanged an email with Ganael (fpart author). He reminded me that in live mode (-L) which I use, you can specify the -W option which I do not (yet) to execute the dependent cmd after the chunk file (his term is 'partition') is written instead of when the file exists.
I'll have to look into how to re-organize parsyncfp (pfp) to handle this - right now fpart is forked and so doesn't communicate with pfp. I'll have to have it write a marker file or something to indicate that the file is closed.
Busy today, so this might not get done immediately.
Stay tuned. Harry
On Mon, Aug 10, 2020 at 10:08 PM idlej notifications@github.com wrote:
@hjmangalam https://github.com/hjmangalam
Hi Harry, I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp version 1.67.
I understand the strategy that rsync is fired off with the completed chunk file as soon as it is done. But in my case, the rsync is fired when the chunk just been created. And in the code the condition to fork a new rsnyc process is just the existance of the new chunkfile.
while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #
$CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile
if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------
fixfilenames( $CUR_FP_FLE, $ROOTDIR ); # check & fix for spaces, bad chars. # entire rsync command and PID capture (used in total of 2 places) $logfile = $bytefiles . $CUR_FPI; $RSYNC_CMD =
"cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";
if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); } # WARN("$RSYNC_CMD"); # there will be as many logfiles as fp chunkfiles. # ie LOTS. but they can be deleted after the run has been verified.. # TODO don't know if we need this logfile. if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); } system("$RSYNC_CMD"); # launch rsync and capture the bg job PID to PIDfile
I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. But in my case it seems the rsync just handle the few lines then exit.
And it's also quiet strange the fpart stopped to write to the chunkfile.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/36#issuecomment-671727863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y44LKDKIDNTQDA55N3SADG3NANCNFSM4P2SCZTQ .
--
Harry Mangalam
--
Harry Mangalam
Hi all,
I create a parsyncfp job to migrate about 4T files with a 4G chunksize.
perl /opt/parsyncfp/parsyncfp --NP=10 --altcache=/root/.parsyncfp/jiaoyu_gv_upload --startdir=/mnt/ps15 jiaoyu_gv_upload /mnt/hw_9000 --maxload=32 --verbose=3 --chunksize=4G --interface=eth4
The rsync process starts and exits immediately after the corresponding cachefile is created. The fpart log tells the cachefile a huge size
But actually there are only few lines in the cachefiles.
There are only small size of files be sent.
I think the problem may be that the rsync starts when the corresponding cachefile just created and the rsync won't wait for the cachefile generating complete.
I don't know if this is the expected scenario and how to fix this.