whatwg / fs

File System Standard
https://fs.spec.whatwg.org/
Other
237 stars 20 forks source link

Add Access Handles to spec #21

Closed fivedots closed 2 years ago

fivedots commented 2 years ago

This PR adds Access Handles to the spec. The Access Handles surface differs from existing ones by offering in-place and exclusive write access to a file’s content. This change, along with the ability to consistently read unflushed modifications and the availability of a synchronous variant on dedicated workers, significantly improves performance and unblocks new use cases.

There are still a few open TODOs where advice would be appreciated!

The PR also updates AccessHandles.md to reflect changes in https://github.com/WICG/file-system-access/pull/367.


Preview | Diff

fivedots commented 2 years ago

A first pass review was done by @domenic in https://github.com/WICG/file-system-access/pull/344.

fivedots commented 2 years ago

Updated examples on AccessHandles.md as proposed by @tomayac.

annevk commented 2 years ago

I don't think we should keep AccessHandle.md as a separate file. If the information in there is relevant to the standard, it should be folded into the main document. If it's not, it should probably be maintained elsewhere. At least we don't usually have such documents alongside WHATWG standards.

jesup commented 2 years ago

I notice that while the webidl indicates that SyncAccessHandle is for workers only, the commentary for webdevelopers doesn't mention workers anywhere.

I also note that there's no definition of AccessHandles (such as you see in https://github.com/WICG/file-system-access/blob/main/AccessHandle.md#exposing-accesshandles-on-all-filesystems), which means the only way to read data is on worker via SyncAccessHandles (WritableFileStreams only support writing data here). I presume this is an intentional decision made sometime since AcessHandle.md was written? And this means there's no way to read data in an async fashion, it's sync only.

There's also no mention of OPFS until the very end of the document; it's referenced by description in the intro, but not by name.

And ditto to anne's points, especially as to why truncate/getSize are async (since they're only for SyncAccessHandle).

jesup commented 2 years ago

See this doc for the justification behind the split sync/async interface.

I think you mean this

a-sully commented 2 years ago

See this doc for the justification behind the split sync/async interface.

I think you mean this

Ah yes, sorry about that. I've updated the link in my comment.

a-sully commented 2 years ago

Friendly ping @jesup @annevk

I'd like to get this PR approved and merged so we can move forward with making all the methods of the SyncAccessHandle sync :)

Thanks!

jesup commented 2 years ago

@a-sully The WPT tests seem to assume you must pass 'at', and if you don't all reads start at 0. This is ... odd, especially for implementing the equivalent to read(), which maintains an internal position in the file, and reads from there. It means for almost all reads (or writes), you must pass at, which means you must track the expected position in the file - likely in a wrapper around syncaccesshandle.read()/write(). Emscriptem certainly can track the location in the file and insert at:nnnn always, but this seems like an odd choice for an API (and not great for performance, due to the extra parsing and validation and the basically forced extra kernel call to seek -- we could avoid the seek() call if we also track the position and elide if if it's a no-op, but that's not worth it either). I imagine this is legacy decisions from the non-OPFS File System API

jesup commented 2 years ago

Though maybe not legacy -- SyncAccessHandles came later. Can (should?) we change this? If you're writing a bunch of data, it's a whole lot more (unnecessary) bookkeeping and overhead.

   position += handle.write(data1, {at: position});
   position += handle.write(data2, {at: position});

But you can't quite do that, since if there's an error and a partial write, you won't notice it. To be pedantic about catching errors, you need to do:

try {
  written = handle.write(data1, {at: position});
  if (written != data1.length()) { /* error */} else { postion += written;}
  written = handle.write(data2, {at: position});
  if (written != data2.length()) { /* error */} else { postion += written;}
} catch(e) { ... }

With classic posix-like read/write (with an assumed position update), it would be

  handle.write(data1);
  handle.write(data2);

or

try {
  if (handle.write(data1) != data1.length) { /* error */}
  if (handle.write(data2) != data2.length) { /* error */}
} catch(e) { ... }
jesup commented 2 years ago

We should probably merge this and then raise any points like the above as issues

jesup commented 2 years ago

@tomayac Thoughts on at: ?

a-sully commented 2 years ago

We should probably merge this and then raise any points like the above as issues

I agree. Would you mind opening an issue for this? I have some thoughts here and it would be nice to have all the discussion in one place

jesup commented 2 years ago

Editorial nit: in truncate(): "If the offset is smaller than offset, it remains unchanged."

jesup commented 2 years ago

https://github.com/whatwg/fs/issues/49

fivedots commented 2 years ago

I also agree merging and moving the discussion to #49 sounds good.

I've updated the description of truncate to fix the confussion mentioned in https://github.com/whatwg/fs/pull/21#issuecomment-1248844601.

@jesup would you mind approving the PR for completeness/visibility? I'll merge it in then!

jesup commented 2 years ago

I can't approve since I don't have commit access, but go ahead.