hpc / mpifileutils

File utilities designed for scalability and performance.
https://hpc.github.io/mpifileutils
BSD 3-Clause "New" or "Revised" License
164 stars 64 forks source link

Dsync shouldn't include directory creation time in data copy rates #493

Closed cmharr closed 3 years ago

cmharr commented 3 years ago

When dsync reports its overall copy rate at each batch threshold, it appears to include the time to create directories which can lead to misleading rate results. In the example below, dsync must create close to 50M directories, which takes a little under an hour to perform. When the batch status reports the amount and it includes the ~3418 second directory creation time; however, the and imply to the user the metrics are for the data copied. Starting the timer for the overall rate when the actual data is being copied would be preferable.

[2021-08-09T13:57:44] Copying items to destination
[2021-08-09T13:57:44] Copying to /p/asplustre3
[2021-08-09T13:57:44] Items: 346654518
[2021-08-09T13:57:44]   Directories: 49246472
[2021-08-09T13:57:44]   Files: 274949669
[2021-08-09T13:57:44]   Links: 22458377
[2021-08-09T13:57:44] Data: 282.312 TiB (1.077 MiB per file)
[2021-08-09T13:57:44] Creating 49246472 directories
[2021-08-09T14:27:45] Created 26429206 directories (54%) in 1800.549 secs (14678.414 dirs/sec) 1554 secs left ...
[2021-08-09T14:54:42] Created 49246472 directories (100%) in 3417.521 secs (14409.998 dirs/sec) done
[2021-08-09T14:54:42] Creating 191843 files.
[2021-08-09T14:55:12] Copying data.
[2021-08-09T14:58:23] Copy data: 90.606 GiB (97287575545 bytes)
[2021-08-09T14:58:23] Copy rate: 486.174 MiB/s (97287575545 bytes in 190.838 seconds)
[2021-08-09T14:58:23] Syncing data to disk.
[2021-08-09T14:58:56] Sync completed in 33.651 seconds.
[2021-08-09T14:58:56] Setting ownership, permissions, and timestamps.
[2021-08-09T14:59:18] Updated 191843 items in 21.178 seconds (9058.655 items/sec)
[2021-08-09T14:59:18] Syncing updates to disk.
[2021-08-09T14:59:51] Sync completed in 33.626 seconds.
[2021-08-09T14:59:51] Started: Aug-09-2021,13:57:44
[2021-08-09T14:59:51] Current: Aug-09-2021,14:59:51
[2021-08-09T14:59:51] Seconds: 3727.125
[2021-08-09T14:59:51] Items: 49438315
[2021-08-09T14:59:51] Data: 90.606 GiB (97287575545 bytes)
[2021-08-09T14:59:51] Rate: 24.893 MiB/s (97287575545 bytes in 3727.125 seconds)
[2021-08-09T14:59:51] Copied 200000 of 346654518 items (0.058%)
adammoody commented 3 years ago

Thanks @cmharr . There are several lines in there. I'm guessing you're referring to the last line here:

[2021-08-09T14:59:51] Rate: 24.893 MiB/s (97287575545 bytes in 3727.125 seconds)

This line is printing the effective bandwidth, which is the sum of all bytes copied divided by the total execution time.

We also have a line to report just the data movement rate, which is the one here:

[2021-08-09T14:58:23] Copy rate: 486.174 MiB/s (97287575545 bytes in 190.838 seconds)

The "Copy rate" line does not include directory or file creation time, or things like updating file metadata (permissions and timestamps) after the files have been copied.

cmharr commented 3 years ago

Thanks Adam. My understanding is the 24.893 MiB/s is the rate for that particular batch (of 200K files), correct? I think what I was looking for was an overall copy rate and the effective rate you mention is what I took to be that overall copy rate. However, given this job took > 7 days to run, the 3-hr walk time and directory create time would be negligible to the overall time so in my case the effective rate is probably accurate.