mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

marfs_rename() should do trash_truncate on the destination #207

Open jti-lanl opened 6 years ago

jti-lanl commented 6 years ago

... otherwise, the storage for the destination is orphaned.

trash_truncate()would preserve the MD xattrs in the trashed file, so GC could potentially find and reclaim the storage.

This comes up both for rename through fuse, and for the new rename-temp-file approach of pftool.

johnbent commented 6 years ago

Just curious: isn't it the source that needs trash_truncate() not the dest? When you call rename(source, dest) then the old location is source and the new location is dest. So you need to clean up the source after the file has been renamed to the dest.

Or maybe there's something here I don't understand. That's likely!

On Mon, Sep 24, 2018 at 6:02 PM Jeff Inman notifications@github.com wrote:

... otherwise, the storage for the destination is orphaned.

trash_truncate()would preserve the MD xattrs in the trashed file, so GC could potentially find and reclaim the storage.

This comes up both for rename through fuse, and for the new rename-temp-file approach of pftool.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mar-file-system/marfs/issues/207, or mute the thread https://github.com/notifications/unsubscribe-auth/AB89Pprxp5zNV7lPsY37GYriD_54gbRtks5ueXKmgaJpZM4W3pmW .

jti-lanl commented 6 years ago

Hi John,

The issue is that object-IDs live in (hidden) xattrs on the MD file.

When we unlink a file, the old MD file is moved to the trash, so we can keep the xattrs (with object-IDs), without unlink having to wait for object cleanup. Later, a garbage-collector might come by and clean up objects corresponding to deleted files.

With rename, we currently just splat the new MD on top of the old one, losing all the xattrs on the old file, leaving orphaned objects for the old file.

In the case of Campaign, it’s possible that we don’t care much, because maybe the entire object-store will be wiped before we ever need GC.

On Sep 25, 2018, at 1:13 PM, John Bent notifications@github.com<mailto:notifications@github.com> wrote:

Just curious: isn't it the source that needs trash_truncate() not the dest? When you call rename(source, dest) then the old location is source and the new location is dest. So you need to clean up the source after the file has been renamed to the dest.

Or maybe there's something here I don't understand. That's likely!

On Mon, Sep 24, 2018 at 6:02 PM Jeff Inman notifications@github.com<mailto:notifications@github.com> wrote:

... otherwise, the storage for the destination is orphaned.

trash_truncate()would preserve the MD xattrs in the trashed file, so GC could potentially find and reclaim the storage.

This comes up both for rename through fuse, and for the new rename-temp-file approach of pftool.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mar-file-system/marfs/issues/207, or mute the thread https://github.com/notifications/unsubscribe-auth/AB89Pprxp5zNV7lPsY37GYriD_54gbRtks5ueXKmgaJpZM4W3pmW .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mar-file-system/marfs/issues/207#issuecomment-424466240, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH22-dlljCUc89xUUAsJLCTX9oflKekuks5ueoBVgaJpZM4W3pmW.

gransom commented 6 years ago

See Jeff was first, but I'll post this whole thing anyhow since I've already bothered to type it up.

If I'm understanding Jeff correctly, this refers to the fact that MarFS can't just do a simple rename of its metadata info. Each MarFS metadata file exists on GPFS, but contains a reference to MarFS objects in its xattrs. If MarFS blindly performs a metadata rename (rename of the GPFS metadata file), then the destination file will get overwritten and those xattrs will simply be lost. As those xattrs contain our only reference to the corresponding data object(s), those objects will now be orphaned. Instead, we need to have MarFS first perform a 'trash' of the destination, marking its data objects for deletion, before then doing the rename of the source (as normal) which would overwrite the metadata file of the dest. Hope this helps.

There is more to be discussed here regarding the safest way to do this. We want to avoid race conditions in which a destination file can be trashed (marked for deletion) but not actually be overwritten by the rename and thus still be user accessible until a GC run. I suspect we may want to consider slapping a restart xattr on the dest before doing anything else, indicating to both the user and the system that the rename has rendered that destination unreadable (maybe trashed, maybe not). If it hasn't, and the operation aborts, no worries because the next op that tries to overwrite will trash it properly. If it has, and the operation aborts, maybe we end up with duplicate info in the trash. That shouldn't be a problem though. I think the GC may handle double deletions without issue. Definitely better in principle than orphaned objs.

jti-lanl commented 6 years ago

We currently regard trash_truncate as “atomic” in the sense that we expect an unlink to simply succeed or fail. This is accomplished by doing the transfer to trash before doing the underlying POSIX unlink on the MD file. (Can't use rename to move MD to the trash, because trash might be in a different filesystem … sigh.)

So, the race I was imagining (adding trash_truncate before the rename) would just potentially leave the target of a partially-completed rename as deleted, rather than as accessible-but-wrong.

But this would still be bad behavior.

So, the proper thing (thanks for helping me see) would be to assimilate the trash_truncate() mechanics into the rename, such that the rename becomes “atomic” with respect to trash, just like unlink already is. Since I’m allergic to duplicated code, this will involve some minor refactoring.

On Sep 25, 2018, at 1:27 PM, Garrett Ransom notifications@github.com<mailto:notifications@github.com> wrote:

[…]

There is more to be discussed here regarding the safest way to do this. We want to avoid race conditions in which a destination file can be trashed (marked for deletion) but not actually be overwritten by the rename and thus still be user accessible until a GC run. I suspect we may want to consider slapping a restart xattr on the dest before doing anything else, indicating to both the user and the system that the rename has rendered that destination unreadable (maybe trashed, maybe not). If it hasn't, and the operation aborts, no worries because the next op that tries to overwrite will trash it properly. If it has, and the operation aborts, maybe we end up with duplicate info in the trash. That shouldn't be a problem though. I think the GC may handle double deletions without issue. Definitely better in principle than orphaned objs.

gransom commented 6 years ago

If it's possible to make rename behave in an atomic fashion, that sounds like it will take care of everything. Just so long as we don't have partially complete renames leaving user accessible (but soon to be GCd) destinations and don't have interleaved renames overwriting destination MD that wasn't actually trashed, I'll be happy.

johnbent commented 6 years ago

Thanks for very nice explanations from the both of you. My confusion was that I didn't consider that the rename was clobbering an existing file. Yes, that thing needs to be cleaned up indeed. By the way, I shared the allergy to duplicated code.

On Tue, Sep 25, 2018 at 1:52 PM Garrett Ransom notifications@github.com wrote:

If it's possible to make rename behave in an atomic fashion, that sounds like it will take care of everything. Just so long as we don't have partially complete renames leaving user accessible (but soon to be GCd) destinations and don't have interleaved renames overwriting destination MD that wasn't actually trashed, I'll be happy.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mar-file-system/marfs/issues/207#issuecomment-424478851, or mute the thread https://github.com/notifications/unsubscribe-auth/AB89Pjpz5w2xMj-sWlj-xdQNi6uVRcmdks5ueomZgaJpZM4W3pmW .

jti-lanl commented 6 years ago

Interleaved renames. Dang.

I suppose we also have an existing risk for interleaved unlinks. (No risk of corruption, I think, but one leaf might be surprised to discover that the file being trashed is suddenly ENOENT).

These can be addressed. I see two different approaches. Let's talk offline.

On Sep 25, 2018, at 1:52 PM, Garrett Ransom notifications@github.com<mailto:notifications@github.com> wrote:

If it's possible to make rename behave in an atomic fashion, that sounds like it will take care of everything. Just so long as we don't have partially complete renames leaving user accessible (but soon to be GCd) destinations and don't have interleaved renames overwriting destination MD that wasn't actually trashed, I'll be happy.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mar-file-system/marfs/issues/207#issuecomment-424478851, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH22-fnoFh5Hjdt3qg869i7uCZ9f2SGqks5ueomZgaJpZM4W3pmW.

gransom commented 5 years ago

We've been discussing a significant rethinking of the trash process that should be able to handle renames in a pseudo-atomic fashion. At the very least, the new approach should eliminate any race conditions that previously could have led to leaking object references.

gransom commented 5 years ago

See issue #203