hpc / mpifileutils

File utilities designed for scalability and performance.
https://hpc.github.io/mpifileutils
BSD 3-Clause "New" or "Revised" License
162 stars 64 forks source link

[Request] change format of dwalk text output #555

Open markmoe19 opened 10 months ago

markmoe19 commented 10 months ago

would it be possible to change the format of the dwalk text output file? [related to request about adding access time] for example, for my post-processing purposes it would be great if file modify and access time were reported as integer seconds (epoch time). Thanks!

markmoe19 commented 10 months ago

@adammoody suggested a possible printf() format string could be passed to dwalk (or dfind) to format text output something like + does for date command. That would be great! Currently, my main use-case requires both atime and mtime for each file's path given in the text output, ideally in %s (epoch seconds) format. File size in bytes would be great too. Thanks!

adammoody commented 10 months ago

Until we have the more general solution, which will likely be a while, it's probably easiest to hack the existing format to suit your needs. You'd want to modify the lines in src/common/mfu_flist_io.c here:

https://github.com/hpc/mpifileutils/blob/47918154ea0f4895623f36ccf8cbfe2df477c3ae/src/common/mfu_flist_io.c#L1643-L1646

For example, the following patch:

diff --git a/src/common/mfu_flist_io.c b/src/common/mfu_flist_io.c
index 0b0a2e5..285afd5 100644
--- a/src/common/mfu_flist_io.c
+++ b/src/common/mfu_flist_io.c
@@ -22,6 +22,9 @@
 #include <errno.h>
 #include <string.h>

+/* define PRI64 */
+#include <inttypes.h>
+
 #include "dtcmp.h"
 #include "mfu.h"
 #include "mfu_flist_internal.h"
@@ -1640,9 +1643,9 @@ static size_t print_file_text(mfu_flist flist, uint64_t idx, char* buffer, size_
         const char* size_units;
         mfu_format_bytes(size, &size_tmp, &size_units);

-        numbytes = snprintf(buffer, bufsize, "%s %s %s %7.3f %3s %s %s\n",
+        numbytes = snprintf(buffer, bufsize, "%s %s %s %7.3f %3s %" PRIu64 " %" PRIu64 " %" PRIu64 " %s\n",
             mode_format, username, groupname,
-            size_tmp, size_units, modify_s, file
+            size_tmp, size_units, size, acc, mod, file
         );
     }
     else {

changes dwalk --text --output list.txt /path lines to print file size, atime, mtime as integers immediately following the human readable file size still shown in floating point with units. So rather than the current format of:

drwxrwx--- user1 user1   4.000 KiB Sep 22 2023 17:08 /path
-rw------- user1 user1 854.000   B Sep 22 2023 17:08 /path/CMakeLists.txt
drwx------ user1 user1   4.000 KiB Sep 22 2023 17:08 /path/daos-serialize

it prints as:

drwxrwx--- user1 user1   4.000 KiB 4096 1696015833 1695427689 /path
-rw------- user1 user1 854.000   B 854 1696016217 1695427689 /path/CMakeLists.txt
drwx------ user1 user1   4.000 KiB 4096 1696016421 1695427689 /path/daos-serialize
markmoe19 commented 9 months ago

That worked great! I also took out the human readable size and commented out the lines to format the size and times. Speeds up the text file output time. Final size of text file is about the same. Best part, my post processing now has atime information AND is ~5x faster! Thank you! :)

adammoody commented 9 months ago

Great! Wow, it's surprising that the string formatting adds so much overhead, but that's also good to know about. Good idea to try that.