Closed cmharr closed 3 years ago
Thanks @cmharr . There are several lines in there. I'm guessing you're referring to the last line here:
[2021-08-09T14:59:51] Rate: 24.893 MiB/s (97287575545 bytes in 3727.125 seconds)
This line is printing the effective bandwidth, which is the sum of all bytes copied divided by the total execution time.
We also have a line to report just the data movement rate, which is the one here:
[2021-08-09T14:58:23] Copy rate: 486.174 MiB/s (97287575545 bytes in 190.838 seconds)
The "Copy rate" line does not include directory or file creation time, or things like updating file metadata (permissions and timestamps) after the files have been copied.
Thanks Adam. My understanding is the 24.893 MiB/s is the rate for that particular batch (of 200K files), correct? I think what I was looking for was an overall copy rate and the effective rate you mention is what I took to be that overall copy rate. However, given this job took > 7 days to run, the 3-hr walk time and directory create time would be negligible to the overall time so in my case the effective rate is probably accurate.
When dsync reports its overall copy rate at each batch threshold, it appears to include the time to create directories which can lead to misleading rate results. In the example below, dsync must create close to 50M directories, which takes a little under an hour to perform. When the batch status reports the amount and it includes the ~3418 second directory creation time; however, the and imply to the user the metrics are for the data copied. Starting the timer for the overall rate when the actual data is being copied would be preferable.