mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 26 forks source link

open() can be mislead by presence/absence of xattrs #96

Open jti-lanl opened 8 years ago

jti-lanl commented 8 years ago

When overwriting a MarFS file, libmarfs should not assume that the presence/absence of xattrs on the current file tells you anything about how the new file is to be written. Our current logic says that if the file has no xattrs, then it's DIRECT, so overwrite it (i.e. the new version will also be DIRECT). Otherwise, if it has xattrs, then move to trash, and use the repo in the object-ID to write the new object.

But the file might just have been written that way by pftool because the repo matching its size was DIRECT. Maybe the iwrite_repo is different, or maybe pftool is opening for a new size-range. [Or maybe an old DIRECT namespace has been reconfigured with a new repo.]

A similar issue would occur if a file has xattrs, but they refer to a repo that doesn't match the iwrite_repo. Currently, we would believe the xattrs, and open from fuse to the repo which was originally used to write the file.


So, the new logic should be:

(a) trash/unlink/truncate the original file based on its xattrs. (DIRECT files don't go to the trash.)

(b) If there are no xattrs, open the new file according to the namespace->iwrite_repo. Otherwise, if the open-time object-type (in the object-ID) indicates pftool, then open according to the repo in the object-ID, else (if open-time obj-type indicates fuse), use the namespace->iwrite_repo.

This works for both fuse and pftool, because pftool always initializes xattrs via batch_pre_process(), so libmarfs can find them. If it hasn't done that, then it is fuse calling.

This might also allow us to deal with changes in the configuration of namespaces that already have data, to use new/different repos.


[Can we ignore the issue about the delay between writing xattrs to GPFS and having them be visible to getxattr(). That seems to apply between processes. It is solved with a call to sync(), which is expensive, though pftool could do it.]

brettkettering commented 8 years ago

Does this matter if we're not doing DIRECT files (which we may never do)? Move this out of the milestone?

brettkettering commented 8 years ago

Related to #112.