mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

what to do about zero-length files on a non-DIRECT repo? #111

Open jti-lanl opened 8 years ago

jti-lanl commented 8 years ago

This was item "(3b)" in issue #83. Moved here, so we can close that.

For MD insert performance tests, with zero-length files, the simple approach to avoiding the cost of writing xattrs would be to use a DIRECT repo, because then you definitely won't be writing xattrs. If you wanted, you could have a namespace that uses DIRECT for size=0 only, like so:

<range>
   <min_size>0</min_size>
   <max_size>0</max_size>
   <repo_name>my_direct_repo</repo_name>
   </range>
<range>
   <min_size>1</min_size>
   <max_size>-1</max_size>
   <repo_name>my_sproxyd_repo</repo_name>
   </range>

Then, MD insert testing via pftool would use a synthetic source which creates source-filenames (and reports length 0 for them), and libmarfs would do the open() with a DIRECT repo, and no xattrs would be written.

However, on a non-DIRECT repo, just skipping writing the xattrs seems like trouble. When we later stat that file, the lack of xattrs implies that it is DIRECT. Therefore, a writer overwriting the file would simply overwrite it in place. It doesn't know any better.

Issue #96 would address some of this. But not all apps have easy access to the namespace. An inode-scan can't tell that this file is not DIRECT. That means the file will not be considered by the quota-update tool, or the packer. It would make no contribution to used-storage, but it does count as an inode. Therefore, a malicious user would be unconstrained in the number of empty files s/he could create. [Created issue #97 to address this.]

The packer has nothing to gain from packing zero-length files. They will take up inode-space, whether packed or not, and they would only clog up the packed file with recovery-info. I believe we are willing to lose zero-length files in the event of a complete loss of MDFS, with recovery from recovery-info in objects. They would add complexity to the packer's task (no object-ID). So, perhaps it would be okay for the packer to ignore them.

brettkettering commented 8 years ago

Related to #96. Should we merge?

Open needs to put xattrs on any newly created file.

jti-lanl commented 8 years ago

Well, currently open does put xattrs on new files, but I've learned that it didn't have to. We should fix open-for-write to use whatever the repo says. We only keep DIRECT files for reading. Then mknod() no longer has to communicate with open().

On Jul 11, 2016, at 3:38 PM, Brett Kettering notifications@github.com wrote:

Related to #96. Should we merge?

Open needs to put xattrs on any newly created file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.