whatwg / fs

File System Standard
https://fs.spec.whatwg.org/
Other
238 stars 20 forks source link

Add FileSystemHandle.move method #10

Open a-sully opened 2 years ago

a-sully commented 2 years ago

Migrated from https://github.com/WICG/file-system-access/pull/317 (see https://github.com/whatwg/fs/issues/2)

Addresses https://github.com/WICG/file-system-access/issues/64 and https://github.com/WICG/file-system-access/issues/65

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

annevk commented 2 years ago

I'd suggest we don't include the Move.md file here.

Explosion-Scratch commented 2 years ago

Any updates on this? Thanks @a-sully for working on this though. When this gets merged what exactly happens? Does this feature get added to all JavaScript engines supporting the whatwg standard?

jesup commented 2 years ago

Open questions:

jesup commented 2 years ago

Open questions:

* What happens if there are existing SyncAccessHandles to a file being moved?

* What if there are FileHandles to a file being moved?

* What if there are DirectoryHandles to a file being moved?

This comment in move.md from the patch partly answers these questions (I presume what's meant here is "All handles to entries"):

  • All entries contained within a directory which is moved no longer point to an existing file/directory. We currently have no plans to explicitly update or invalidate these handles, which would otherwise be an expensive recursive operation.

That doesn't answer about FileHandles or SyncAccessHandles to a file being moved/renamed.

* What happens if a different process/worker/etc tries to grab a FileHandle/DirectoryHandle/SyncAccess handle on a file or directory being moved while it is still in the process of being moved (promise has not resolved yet).   If this does not succeed, what error is thrown?

* Is it possible that a move() could end up in a state other than completely moved or completely not-moved?   I.e. if there's a disk or permission error during moving a directory.   This is unlikely in OPFS, but may be possible in the more general usage.

This is discussed in move.md, but there's no resolution of the discussion.

jesup commented 2 years ago

Right now the justification for the answer about moving directories is based on move.md, which isn't here. There's also no description in the spec as to what happens to any of these cases; these need to be speced. One can infer that if we move foo to bar, that the directory now holds bar instead of foo. But there may be race conditions between a worker and mainthread, or worker and worker... It also means that one needs to reverse-engineer the current implementations.

jesup commented 2 years ago

Note that the WPT tests for move() encode several things not in this spec related to what are valid names:

testing that you can't rename a file to "Lorem." (no trailing periods - this isn't specified) testing that you can't rename a file "#$23423@352^*3243" (no invalid characters - this isn't specified - none of those are separators)

The spec says: "A valid file name is a string that is not an empty string, is not equal to "." or "..", and does not contain '/' or any other character used as path separator on the underlying platform." and "Note: This means that '\' is not allowed in names on Windows, but might be allowed on other operating systems. Additionally underlying file systems might have further restrictions on what names are or aren’t allowed, so a string merely being a valid file name is not a guarantee that creating a file or directory with that name will succeed." "We should consider having further normative restrictions on file names that will never be allowed using this API, rather than leaving it entirely up to underlying file systems."

jesup commented 2 years ago

The tests also test that move() can overwrite an existing file. This is mentioned nowhere in the spec, and the spec says "Attempt to move entry to destination in the underlying file system." which does not to me indicate it should overwrite. (It also checks that having a WritableFileStream open on the destination blocks move())

jesup commented 2 years ago

Also file.move(dir, "") should fail, per the spec; the wpt tests assume it's the same as file.move(dir)

a-sully commented 2 years ago

Thanks for bringing up these questions! Apologies for the confusion, Move.md is a bit outdated since it was written before the locking mechanisms for SyncAccessHandles were created. I'm planning to update this PR once #21 lands to incorporate those locking mechanisms. For now, to respond to your questions...

Open questions:

  • What happens if there are existing SyncAccessHandles to a file being moved?

move() takes an exclusive lock on the handle. Therefore a file cannot be moved while there is an open SyncAccessHandle (a reminder to me that we really need to clear up #18)

  • What if there are FileHandles to a file being moved?
  • What if there are DirectoryHandles to a file being moved?

30 is intended to address these. Since a FileSystemHandle maps to a path, these handles will dangle once the underlying file/dir is moved.

  • What happens if a different process/worker/etc tries to grab a FileHandle/DirectoryHandle/SyncAccess handle on a file or directory being moved while it is still in the process of being moved (promise has not resolved yet). If this does not succeed, what error is thrown?

If a SyncAccessHandle is being created, presumably whichever operation is able to grab their respective lock will win and the other will reject.

As for getFileHandle() and getDirectoryHandle(), if the path has been moved (and create: true has not been specified) then I don't think this deserves any special treatment? Do you see a problem with rejecting with NotFoundError as is currently specified when the requested path does not exist?

  • Is it possible that a move() could end up in a state other than completely moved or completely not-moved? I.e. if there's a disk or permission error during moving a directory. This is unlikely in OPFS, but may be possible in the more general usage.

Yes. A couple examples:

In these cases, the promise will be rejected with an AbortError.

The tests also test that move() can overwrite an existing file. This is mentioned nowhere in the spec, and the spec says "Attempt to move entry to destination in the underlying file system." which does not to me indicate it should overwrite. (It also checks that having a WritableFileStream open on the destination blocks move())

You are correct. And I'm wondering whether this is the right behavior in the first place... I think there's a strong argument to be made that we should reject a move to a path that already exists and allow the site to delete the destination handle and try again if they truly want it overwritten.

a-sully commented 2 years ago

Also file.move(dir, "") should fail, per the spec; the wpt tests assume it's the same as file.move(dir)

Good catch. We should probably update the WPTs and not the spec? Ideally the "is this a safe file" code can be shared between move(), getFileHandle(), and getDirectoryHandle()

jesup commented 2 years ago

Also file.move(dir, "") should fail, per the spec; the wpt tests assume it's the same as file.move(dir)

Good catch. We should probably update the WPTs and not the spec? Ideally the "is this a safe file" code can be shared between move(), getFileHandle(), and getDirectoryHandle()

yes, the WPTs should be updated

jesup commented 2 years ago

Attempts to move a directory to within itself should fail. Attempts to move the root directory should fail... at least in OPFS, but that would be handled by the above restriction anyways. No idea about non-OPFS. Note that for non-OPFS if you move a directory out of OPFS you have to check all files in the directory for safe-browsing

jesup commented 2 years ago

The wpt tests check that moving a file to the same name succeeds (I.e. succeeds in doing nothing). (They don't check if moving a directory to the same name succeeds, note.) This behavior does not appear to be anywhere in the spec here; it basically says "attempts to move it in the underlying filesystem" which could fail or succeed. The operation here should be well-specified, probably by saying that it should always succeed (for both directories and files)

jesup commented 2 years ago

I wonder if the fact that move-that-doesn't-move succeeds in chrome is related to the fact that it currently silently allows a move() to overwrite a file. At this moment, we fail that WPT test because we check if the destination file exists and if so, fail.

a-sully commented 2 years ago

We've received feedback from partners that we should not reject moves that would otherwise overwrite a file. Emscripten (@tlively) mimics the posix interface, which expects move() to be able to overwrite a file. The API doesn't afford a safe solution for emulating overwriting moves, and it's not great to require application code to change to comply with this.

We could introduce this as an opt-in (as I suggested earlier https://github.com/whatwg/fs/pull/10#discussion_r1018585899) if we think the default should be not to allow overwrites. Thinking about the non-OPFS use cases, at the very least I'd like to advocate that user agents should be able to reject a rename operation if the site does not have access to the parent directory

annevk commented 2 years ago

I'd like to advocate that user agents should be able to reject a rename operation if the site does not have access to the parent directory.

That seems more like an access failure and not related to move semantics.

Do we have use cases where you want to throw for overwrites? Can they be solved through some other way?

a-sully commented 2 years ago

That seems more like an access failure and not related to move semantics.

Fair enough

Do we have use cases where you want to throw for overwrites? Can they be solved through some other way?

My expectation is that C code being ported to the web via WASM will expect to be able to overwrite (as posix allows), but for web-native applications (especially those outside the OPFS), not allowing overwriting seems like the safer thing to do, since accidental overwrites would result in data loss.

In both cases, you can somewhat mimic the opposite behavior. If you have write access to the destination directory and the default is...

annevk commented 2 years ago

TIL about TOCTOU, makes sense. So I guess being able to pass some option to move() along the lines of throwOnReplace would help, so it's all atomic.

I just noticed that @jesup wrote above that Firefox currently throws for overwrites. Is matching POSIX here not as accepted as I would have assumed?

annevk commented 2 years ago

I chatted about this with @szewai and she made the point that applications that are concerned about TOCTOU can use Web Locks. And that yeah, POSIX is expected.

a-sully commented 2 years ago

Ah, good point about Web Locks. I'll note a few things:

It seems to me we have three options:

  1. Make the existing methods disallow overwrites by default and expose new API surface to opt in to allowing overwrites
  2. Make the existing methods allow overwrites by default and expose new API surface to opt out of allowing overwrites
  3. Make the existing methods allow overwrites by default and force developers to use WebLocks to ensure files are not overwritten

If we're okay with breaking backwards-compatibility on the existing methods (which I acknowledge is not risk-free), I'd prefer (1). It may be unexpected for the API to not give a warning if a move operation results in overwriting an existing file, especially outside of the OPFS (and while we could have a different default outside of the OPFS, I think that may be confusing for developers and would make it more complicated to later support passing an option such as throwOnReplace, since the default behavior would be different depending on where the file lives). Emscripten and other applications which expect POSIX behavior could then explicitly opt in to allowing overwrites. This option both gives power users the capabilities they need, while protecting the average API user from what would otherwise be a completely silent overwrite of an existing file.

Thoughts?

annevk commented 2 years ago

My preference would be 3 (i.e., match POSIX) and then perhaps do 2 down the line if there's compelling use cases presented. (To be sure I tested and indeed mv overwrites without asking as well.)

a-sully commented 2 years ago

Fair enough. Unless @jesup has objections, I can update the WPTs and spec to assert that overwrites are allowed

jesup commented 1 year ago

From https://man7.org/linux/man-pages/man2/rename.2.html POSIX rename() atomically replaces 'newfile' on rename() if it exists, so you're correct that posix wants that behavior. I think at least some non-normative mention of WebLocks to avoid overwriting would be helpful here. While I might prefer 2 to 3, I don't think that's a big deal. I probably prefer 1 to either 2 or 3, but I don't think there's a requirement to go that way; I just think the API would be better that way; the same functionality is available with all 3 options (more or less). (Just another area that shows it's better to get agreement on APIs first, before any field lockin occurs... like with SyncAccessHandle synchronous-ness)

a-sully commented 1 year ago

Just another area that shows it's better to get agreement on APIs first, before any field lockin occurs... like with SyncAccessHandle synchronous-ness

Agreed!

On that note, we're planning to ship move() of files outside of the OPFS on Chromium browsers soon. Please feel free to leave feedback on the explainer (https://github.com/a-sully/fs/pull/2) or TAG review (https://github.com/w3ctag/design-reviews/issues/805)

Q: Should we scope down this PR to only consider file moves? Ideally this PR would have merged months (when we decided to support move() for files within the OPFS) but I worry it will be tricky to land this PR while it contains directory moves while #59 is unresolved. Thoughts?

a-sully commented 1 year ago

Hi all, I've updated this PR to make use of the various infrastructural changes that have merged recently (primarily #95 and #96). Please take a look at the updated PR. Thanks!

@jesup I've tried to answer all your questions. Apologies for not being able to answer them months ago! (We really needed the above infrastructure changes to appropriately answer many of them.) Please let me know if there's anything I've missed.

Open questions:

  • What happens if there are existing SyncAccessHandles to a file being moved?

move() takes a lock on both the source and destination file. If there is an open SyncAccessHandle, neither the file itself nor any of its parents (as per https://github.com/whatwg/fs/pull/139#pullrequestreview-1528941955) are able to be moved

  • What if there are FileHandles to a file being moved?

Only the handle which move() is called on is updated. I've added a note about this in the PR

  • What if there are DirectoryHandles to a file being moved?

Ditto

  • What happens if a different process/worker/etc tries to grab a FileHandle/DirectoryHandle/SyncAccess handle on a file or directory being moved while it is still in the process of being moved (promise has not resolved yet). If this does not succeed, what error is thrown?

We now have a file system queue!

  • Is it possible that a move() could end up in a state other than completely moved or completely not-moved? I.e. if there's a disk or permission error during moving a directory. This is unlikely in OPFS, but may be possible in the more general usage.

Yes. I've added a note about this in the PR, though since no browser has yet implemented directory moves outside of the BucketFS, that part is admittedly a bit vague for now

  • I think at least some non-normative mention of WebLocks to avoid overwriting would be helpful here
  • Attempts to move a directory to within itself should fail.
  • Attempts to move the root directory should fail... at least in OPFS

Added!

Note that for non-OPFS if you move a directory out of OPFS you have to check all files in the directory for safe-browsing

As per this, I've specified that moving across the OPFS boundary is not allowed (for now). I welcome thoughts on #114, as well as thoughts on supporting cross-BucketFS moves

  • The wpt tests check that moving a file to the same name succeeds (I.e. succeeds in doing nothing).

I agree that we shouldn't need to actually call into e.g. mv, but I don't think we should do nothing. We should definitely run permission checks, for example, and it seems useful to still e.g. get a NoModificationAllowedError if the file is locked. For now I've specified it such that we'll run most all the steps right up until mv, but exit early before actually attempting to move on disk.

a-sully commented 1 year ago

A few things to note:

JasonGrass commented 11 months ago

The behavior of renaming files is inconsistent with the native API.

When using the FileSystemHandle.move method to rename files, it modifies the "Modified time" of the file. However, this issue does not occur when manually operating or using the native API, as renaming usually does not modify a file's "Modified time". Is this behavior expected?

Should I report this issue here or somewhere else?

PS My test environment: Win10, Chrome120.0.6099.129

image

miketaylr commented 11 months ago

Is this behavior expected?

AFAICT, this PR doesn't say anything about setting a file's lastModified (or to not do so). @a-sully?