mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

MetaData Abstraction Layer #83

Closed brettkettering closed 8 years ago

brettkettering commented 8 years ago

Jeff Inman Writes: The MD abstraction layer could be set up in the usual C way, with a struct of function-pointers. I'll probably wish it were done in C++, someday, like with PFTool, which is now nice and flexible.

Either way, add an optional property to the namespace config, saying what abstraction layer to use. We'd cook up a struct for each NS, with all the proper function pointers. Default to what we have now. Then just make calls through the struct.

brettkettering commented 8 years ago

Gary Grider Writes: Sounds right.

We also need an access method not only on the namespace but also on the md shard spec as well so we could have a different access method for directory md and file md which is starting to make a lot of sense to me.

I like the idea of a default that is just do the posix call. Just like adio on mpiio.

I don't care the method, well not Java or other crap like that.

If Jeff wants to architect and do one call, then someone else could copy that method for the other posix calls, so it's consistent :-)

This and playing with ofs and iofsl for md perf are the two things we need to get started with on the demo.

When the md abstraction is done, that part could be merged back in to take prod tree post testing or something like that.

jti-lanl commented 8 years ago

[lifted from would-be #84]

The idea is to allow all MD interactions (e.g. open/close/read/write/getxattr/setxattr/etc) to be done through a struct of function-pointers. This allows us to replace those function-pointers for a different MDFS. For example, if we need IOFSL-calls to get to the MD, those would be a new set of function-ptrs.

Add an (optional) configuration-attribute to Namespace. This should be a symbolic name for the desired abstraction-layer. Default this to what we already do (which could just be called POSIX).

read_configuration() will then instantiate structs of fn-ptrs for all the known abstraction-layers (or, at least, all the ones that are used). NSes get ptrs to the one specified for them in the config.

libmarfs then would do all MD interactions via the fn-ptrs, instead of directly.

NOTE: We also need an abstraction config-attribute for MD-shard config, as abstraction will apply there, as well. (If shard-config is just part of NS config, then carry on.)

QUES: Is this going to get ugly? What if different abstractions need different state to be maintained? What if someone wants to store MD in a non-POSIX way? What if we want a proliferation of little helper functions, or state-updaters, etc? It's tempting to steal the C++ basics from pftool (cpp branch), which resolves some of these issues pretty naturally.

brettkettering commented 8 years ago

(1) replace existing MD calls with calls through a new MDAL struct

-- extend configuration so Namespace can/must specify MDAL type -- default/config to use POSIX for existing namespaces -- replace exiting MDFS calls to use MDAL -- put in enough infrastructure to allow Dave/Hb to develop/deploy an IOFSL/OrangeFS alternative

jti-lanl commented 8 years ago

Also:

(2) support directory sharding

-- I think we agreed there is another MDAL for talking to the shard FS. -- extend config so Namespace can/must specify MDAL type for shards -- code tweaks to identify sharded directories, and make calls through shard MDAL.

The point of this is to allow flexibility in the choice of the tech on which the shards are implemented. For the MD Scaling Demo project, we might picture talking IOFSL (or Cray's DVS) to the FSes where the shards live. The commands in the MDAL might as well be the union of ones needed to talk to the dir-MD and ones needed to talk to the file-MD (i.e. on the shard-FS). Then we really have two MDAL instantiations coming from the config.


(3) Refactor MarFS write() to work like read() -- don't start the PUT in open(), wait until the write(). -- zero-length files might have no object? (Maybe that's good.) -- tweak code to handle zero-length files?

The point of this is to make it easier/cheaper to write zero-length files. Currently, we'd pay the cost of opening/closing a PUT for each zero-length file.

jti-lanl commented 8 years ago

[sorry, but this question just seems less simple than I'd like]

(3b) what to do about zero-length files on a non-DIRECT repo?

For MD insert performance tests, with zero-length files, the simple approach to avoiding the cost of writing xattrs would be to use a DIRECT repo, because then you definitely won't be writing xattrs. If you wanted, you could have a namespace that uses DIRECT for size=0 only, like so:

<range>
   <min_size>0</min_size>
   <max_size>0</max_size>
   <repo_name>my_direct_repo</repo_name>
   </range>
<range>
   <min_size>1</min_size>
   <max_size>-1</max_size>
   <repo_name>my_sproxyd_repo</repo_name>
   </range>

Then, MD insert testing via pftool would use a synthetic source which creates source-filenames (and reports length 0 for them), and libmarfs would do the open() with a DIRECT repo, and no xattrs would be written.

However, on a non-DIRECT repo, just skipping writing the xattrs seems like trouble. When we later stat that file, the lack of xattrs implies that it is DIRECT. Therefore, a writer overwriting the file would simply overwrite it in place. It doesn't know any better.

Issue #96 would address some of this. But not all apps have easy access to the namespace. An inode-scan can't tell that this file is not DIRECT. That means the file will not be considered by the quota-update tool, or the packer. It would make no contribution to used-storage, but it does count as an inode. Therefore, a malicious user would be unconstrained in the number of empty files s/he could create. [Created issue #97 to address this.]

The packer has nothing to gain from packing zero-length files. They will take up inode-space, whether packed or not, and they would only clog up the packed file with recovery-info. I believe we are willing to lose zero-length files in the event of a complete loss of MDFS, with recovery from recovery-info in objects. They would add complexity to the packer's task (no object-ID). So, perhaps it would be okay for the packer to ignore them.

jti-lanl commented 8 years ago

Item (1) is complete. To use the MDAL, build using 'make ... USE_MDAL=1'. In this case, all MD interactions currently default to POSIX. To introduce an alternative:

  <dir_MDAL>POSIX</dir_MDAL>
  <file_MDAL>POSIX</file_MDAL>
brettkettering commented 8 years ago

Jeff:

I wonder if we should close this issue and open individual issues for #2, #3, and #3b? They seem like they can be individual enhancements to #1.

If you agree, let's close this and open 3 new ones.

Thanks, Brett

jti-lanl commented 8 years ago

Done. Captured remaining sub-items in issues #69, #111, and #112.