[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)?

markmoe19 commented 1 year ago

I have a large 1.4TB .mfu file generated by dwalk for 502M items. I want to generate an unsorted text output file from this mfu file.

Does dwalk read the entire mfu file into RAM before outputting the text file? For sorted output, I could see reading into all into RAM. But for unsorted output, could dwalk “stream” the output as it reads the mfu input and thereby not use much ram?

I’m asking because I have a service node that can generate the generate the mfu file but doesn’t have enough ram to generate (unsorted) text output from that same mfu file.

Thanks!

Mark

adammoody commented 1 year ago

Good question. At the moment dwalk and the other tools can only read the entire file at once. We'd have to hack together a tool for the streaming bit.

The code that reads the .mfu file (file format version 4) is here:

https://github.com/hpc/mpifileutils/blob/fac50e0f97ba16a79a15d29e0c8010dd5e8f16b6/src/common/mfu_flist_io.c#L844

The loop that unpacks an encoded file entry read from the .mfu file is here:

https://github.com/hpc/mpifileutils/blob/fac50e0f97ba16a79a15d29e0c8010dd5e8f16b6/src/common/mfu_flist_io.c#L1060-L1065

The biggest change is that we couldn't unpack the entries into an mfu_flist like this function does, since the flist structure expects to have the full list loaded in memory at once. However, one could look to modify the unpack function to just print the file name instead of inserting the element into the list.

The list_insert_ptr() function unpacks each element and adds it to the list:

https://github.com/hpc/mpifileutils/blob/fac50e0f97ba16a79a15d29e0c8010dd5e8f16b6/src/common/mfu_flist_io.c#L332

Most of the heavy lifting in parsing the data for each file is in list_elem_unpack, which shows how the fields in the element are set:

https://github.com/hpc/mpifileutils/blob/fac50e0f97ba16a79a15d29e0c8010dd5e8f16b6/src/common/mfu_flist_io.c#L275

You could perhaps just cut-paste-edit list_insert_ptr function to have a print_ptr version that allocates, unpacks, prints the file name, and frees the element, something like:

static size_t print_ptr(char* ptr, int detail, uint64_t chars)
{
    elem_t* elem = (elem_t*) MFU_MALLOC(sizeof(elem_t));
    size_t bytes = list_elem_unpack(ptr, detail, chars, elem);
    printf("%s\n", elem->file);
    mfu_free(&elem->file);
    mfu_free(&elem);
    return bytes;
}

It would be cleaner still to avoid allocating and freeing the element each time.

adammoody commented 1 year ago

Actually, after reviewing the code, you may be able to read this file back on the same node using dwalk.

The current v4 of the .mfu format stores file names as fixed length fields, where every file name is padded to the longest file name in the set.

https://github.com/hpc/mpifileutils/blob/fac50e0f97ba16a79a15d29e0c8010dd5e8f16b6/src/common/mfu_flist_io.c#L248-L251

That decision makes it easy to seek to a specific entry in the .mfu file, but it also significantly inflates the .mfu file size if there are many files and one really long filename:

disk space ~ (numfiles * max(filename length))

When reading the file entries back from the .mfu file, we copy each file name using strdup:

https://github.com/hpc/mpifileutils/blob/fac50e0f97ba16a79a15d29e0c8010dd5e8f16b6/src/common/mfu_flist_io.c#L284-L285

The strdup will drop all of that extra padding, so the same list when read back into memory will take less space than when stored on disk.

If you generated this list from a single node, you might be able to read it back on that single node.

With the eventual v5 .mfu file format, whenever that comes, I hope we can store file names using variable length structures to avoid this problem. That will likely require the addition of an index to support efficient seeks.

adammoody commented 1 year ago

If you find that you can't use dwalk, let me know, and I can hack up a branch with a tool to get you started.

hpc / mpifileutils

[Request] can dwalk "stream" text output as it reads mfu file (to avoid high RAM usge)? #563