whatwg / fs

File System Standard
https://fs.spec.whatwg.org/
Other
219 stars 19 forks source link

Interactions with the BFCache #17

Open a-sully opened 2 years ago

a-sully commented 2 years ago

Migrated from https://github.com/WICG/file-system-access/issues/319 (see https://github.com/whatwg/fs/issues/2)

@ArthurSonzogni

You may want to reference this from the BFCache meta bug: https://github.com/whatwg/html/issues/5880 +CC @rakina

I was wondering what kind of interaction this feature will have with the Back-forward cache. I see the execution context can obtain a lock for a file. Then the document might be (?) able to enter the BackForwardCache without releasing it. Maybe this could cause some problems? For instance there are other features like WebLock which prevents the BFCache from being used. Maybe the same should happen here: https://source.chromium.org/chromium/chromium/src/+/main:content/browser/renderer_host/back_forward_cache_impl.cc;l=265;drc=e5616aeff413a17175a96056b3bf1fbc9ca0ade7;bpv=1;bpt=1 Maybe you did it already. I see kWebFilesystem. Not sure if this is related.

More generally, you might want to clarify if there are any BFCache specific behavior about this feature.

@rakina

On the kWebFilesystem thing, I think @hajimehoshi and @fergald did add WebFileSystem support for BFCache, but I'm not sure about the details - maybe the can comment on what Chrome is doing and if there's anything to be added on the spec side.

@fergald

My understanding is that WebFileSystem had nothing that prevented bfcaching (there are no connections to bring up/down or locks). I see that it's unblocking is still controlled by a flag but we should clean that up and leave it unconditionally unblocked.

I don't think there's any spec change (unless we want to state positively that it is compatible with bfcache) but we should probably add a WPT that checks if a page using WFS is cached. That would make it clear which browsers support it.

@mkruisselbrink

On the chrome side currently using the file system access API with local files on the file system will cause the page to be prevented from entering bfcache, although I think that is primarily since we haven't reased about interaction with usage tracking etc for our current permissions implementation (where lifetime of permissions is dependent on the presence of tabs with pages loaded).

For the newly proposed access handle API, which include file locking, I do suspect we'll need explicit spec-level integration with bfcache. We don't want pages in the bfcache to be able to keep files locked after all.

rakina commented 2 years ago

Thanks! Wondering if there's any resolution needed here. BTW, there's a guide on how to handle non-fully active documents (which includes bfcached documents) in spec.

a-sully commented 2 years ago

TLDR: this isn't currently a concern, but will become one soon

The primary concern here is that file handles which have acquired locks (these locks are internal to the File System Access API) will enter BFCache without releasing said locks.

Currently:

  1. The only meaningfully enforced locking in the API is that creating an Access Handle grabs an exclusive lock on the file,
  2. Access Handles are only available within dedicated workers, and
  3. BFCache is not supported in dedicated workers

Once any of the above conditions change, we will need to revisit this.

As for when these will change...

  1. See #19 (and also #18)
  2. Will likely change if/when we support Async Access Handles
  3. Tracked in https://github.com/whatwg/html/issues/7216. See the latest design doc

Please let me know if any of this does not match your understanding!

rakina commented 2 years ago

Thanks, actually we have started experimenting with DedicatedWorker support for BFCache ~1 month ago, so we probably should work on resolving this (cc @fergald). Is it reasonable to drop the lock on navigation away from the page? (and maybe restore them on BFCache restore?). Do we need the events proposed at https://github.com/whatwg/html/issues/7216 for that to make sense (so that the page can prepare for the lock to be dropped somehow?)

annevk commented 2 years ago

Ideally this behaves the same as Web Locks, I'd imagine?

@saschanaz could you comment on what the story is there?

saschanaz commented 2 years ago

Ideally this behaves the same as Web Locks, I'd imagine?

@saschanaz could you comment on what the story is there?

Gecko prevents BFCache when at least one lock is held or a pending request exists, so that closing the page can immediately release the lock. AFAIK Chrome prevents BFCache when at least one lock has been requested.

a-sully commented 2 years ago

Preventing BFCache would certainly work here (this is what the API currently does in Chromium), though I think it's worth exploring other options...

Is it reasonable to drop the lock on navigation away from the page? (and maybe restore them on BFCache restore?). Do we need the events proposed at whatwg/html#7216 for that to make sense (so that the page can prepare for the lock to be dropped somehow?)

We should definitely drop the lock in some way when navigating away from the page. However, re-acquisition of the lock upon restore can't be guaranteed, since the file may have been locked by another tab in the meantime.

This becomes tricky since we're not allowed to construct a sync Access Handle without first acquiring a lock. Is there precedent for destroying all lock-holding objects when the page enters the BFCache? This might be enough for this API since these locks are less generic than for Web Locks? It's pretty straightforward for a site to re-acquire an Access Handle, for example.

That being said... the API does acquire locks in other cases:

If destroying a lock-holding object is feasible, we can destroy the writable streams, just like we would the Access Handles.

Apologies for my lack of familiarity with the BFCache, but I'm not sure what would happen in the latter two cases. move() in particular can be a long-running operation if we're recursively moving a directory. What happens if the page goes into the BFCache while the operation is in progress?

rakina commented 2 years ago

I strongly advise against disabling BFCache for new APIs. For older/shipped APIs like Web Locks we're pretty much stuck because websites already use them before BFCache is a thing (at least in Chrome), while we can shape the behavior we want with this new API (not sure what stage this is in?)

We should definitely drop the lock in some way when navigating away from the page. However, re-acquisition of the lock upon restore can't be guaranteed, since the file may have been locked by another tab in the meantime.

Yes this sounds like the way to go, and don't re-acquire the lock automatically on restore. I think we should just advise developers to re-acquire the lock manually on BFCache restore. This might be tricky if we expect them to do that on DedicatedWorker... If you think it's necessary, we might need to ship https://github.com/whatwg/html/issues/7216.

Apologies for my lack of familiarity with the BFCache, but I'm not sure what would happen in the latter two cases.

Will those still be a problem if we always drop the lock on navigation? If we can guarantee that a bfcached page will never hold a lock, it doesn't sound like the bfcached page would be affected by anything outside of the page, and vice versa?

smaug---- commented 2 years ago

3. BFCache is not supported in dedicated workers

That is a Chromium implementation detail. Some other browser engines do support workers in bfcache.

I think we should just advise developers to re-acquire the lock manually on BFCache restore.

That sounds rather bad model. The whole point of bfcache is that devs don't really need to care about it.

jesup commented 2 years ago

There's one way we could allow pages using OPFS into the bfcache: If a page using OPFS is in the bfcache, and another page tries to use the same files in a way that conflicts with the bfcached page (*), the page is evicted from the BFCache.

See also https://github.com/web-platform-tests/wpt/pull/31082 (hat tip to smaug)

fergald commented 2 years ago

I think we should just advise developers to re-acquire the lock manually on BFCache restore.

That sounds rather bad model. The whole point of bfcache is that devs don't really need to care about it.

At the top of the issue, it says the locks are internal. The spec makes no mention of locks at all.

@jesup's comment sounds like one reasonable implementation and is a common pattern for BFCache but this all seems to be about implementation but how does it map to the spec?

What should happen if 2 pages call fileHandle.createWritable({ keepExistingData: true })` on the same file in different tabs? I haven't read the full spec but I can't find that addressed in it. I guess the locks discussed above mean that Chrome would make the second opener wait for the first opener to finish.

Is that something we want to spec? That would then give a pretty clear signal that you must evict a page that is in BFCache with a writeable file open.

If we're not willing to spec that writeable opens are exclusive across tabs then it's unclear to me that holding a filehandle should prevent BFCaching.

jesup commented 2 years ago

What should happen if 2 pages call fileHandle.createWritable({ keepExistingData: true })` on the same file in different tabs?

They take read locks, and so they both could have them; both createWritable's would succeed. (They would block any attempt to createSyncAccessHandle, at least until the writables are close()ed.)

If a tab with a writable is in the cache, and another tab calls createWritable for that file, that would succeed in my plan, and you would not have to evict the BFCached page. If the tab called createSyncAccessHandle, you would have to evict, since you can't have a SyncAccessHandle with any other SyncAccessHandle or WritableFileStream to the same file.

fergald commented 2 years ago

Thanks for the clarification. Sound good.

a-sully commented 2 years ago

Overall SGTM if we can figure out the edge cases. There's an open issue to better spec locking mechanisms (https://github.com/whatwg/fs/issues/18). We're planning to expand the use of locking soon (in the upcoming move() and remove() methods, as well as other possible write modes) so the language should probably be a bit more generic:

That being said, there are some edge cases here I'm not sure if I have enough knowledge to understand the full implications of. For example...

The read() and write() methods are synchronous, meaning they block the DedicatedWorker while they run. I created a demo site to test what happens when we navigate away from the page in the middle of a write: https://access-handle-navigation-test.glitch.me/. It appears that the write operation completes successfully (at least on my machine) and does not prevent navigation. I'm not sure exactly what the lifetime of a DedicatedWorker is relative to the site it serves, but it seems that even if we evict the page, the handle (and therefore lock? perhaps depending on the implementation?) may still be around? I'm also not sure whether a beforePutToBFcache event (as proposed in https://github.com/whatwg/html/issues/7216) would actually fire in this case?...

I assume navigating away from a page during long-running async operations (like recursively moving a directory) might have similar problems, even if they're issued from a window? Are there other APIs we can look towards as precedent here?

jesup commented 2 years ago

@a-sully -- please open a separate issue for speccing the interaction between navigation and aync operations

a-sully commented 2 years ago

The interaction between navigation and sync operations seems like something that's not specific to this API? Presumably other APIs which run code synchronously from workers encounter the same issues?

a-sully commented 2 years ago

I agree this is something that should be specified, but I'm not sure if this is the right repo for it. @annevk may know?

annevk commented 2 years ago

@rakina and @fergald can advice on this I think, but in general you want to define whether unloading a document ends up impacting its salvageable state. To do this you would define unloading document cleanup steps. Generally these are defined outside HTML at the moment, except for some APIs. At some point it might make sense to centralize them if we run into ordering issues.

jesup commented 2 years ago

So this case is a little different than unloading with WebSockets, etc. We (probably) want to maintain the file locks and filehandles while cached (salvagable), but if that would impact any other page's actions, we'd evict it (close all the actors/filehandles/drop locks).
How I would phrase it in a spec would be something along the lines of "A document using this API can be salvagable, but must become not salvagable if it's existence would be visible to any active document". I.e. if a page's existence in a cache would cause any observable change in state to an active page, it must be evicted. Note that this eviction does not happen at the time of unload, so the unloading cleanup steps wouldn't be the place to clean things out. You could implement it like that (keep locks/etc alive in bfcache); you could also drop them all, and on an attempt to salvage, throw the document away if all the state can't be re-acquired at that time. That might slightly increase the likelihood of a page being retrieved from the bfcache (but only slightly), and it might also be slightly more work (but this may be arguable; both solutions involve some coding).

annevk commented 2 years ago

Ah right, I don't think we have a good hook for "this action ends up destroying a bfcache entry, if any". @domenic might also be able to comment as to whether that's something that will be introduced as part of the ongoing bfcache work.

jesup commented 2 years ago

BTW, I think the idea of dropping and re-acquiring can lead to subtle deviations from what could happen if it hadn't gone into the cache, so I would avoid that, and keep the locks (and drop the page on any attempt that would conflict with the locks)

rakina commented 2 years ago

Just FYI, BFCache people are talking about adding a way to spec eviction cases like in here easily at https://github.com/whatwg/html/issues/7253#issuecomment-1157202306, so hopefully the spec changes here won't be blocked by that soon enough.

On navigation & synchronous operations, I guess most APIs only affect things within its own document, while this API have external side effects. It might be good to look at how APIs that have external effects (IndexedDB? not sure if they have sync operations) behave.

jesup commented 1 year ago

For unloading document cleanup steps: "A document using this API must become not salvagable if it's existence would be visible to any active document. This would happen if this document has a FileSystemSyncAccessHandle to an object, and another document were to attempt to access that object using this API." (and maybe list the other ways it can happen.) @annevk any wordsmithing ideas?

rakina commented 1 year ago

For unloading document cleanup steps: "A document using this API must become not salvagable if it's existence would be visible to any active document. This would happen if this document has a FileSystemSyncAccessHandle to an object, and another document were to attempt to access that object using this API." (and maybe list the other ways it can happen.)

I think you mentioned in https://github.com/whatwg/fs/issues/17#issuecomment-1149951040 that you want this to not prevent the page from being BFCached, and instead only "evict" a BFcached/non-fully active document when it can impact other pages, which I think is the right way to do this. To spec the eviction, I think you can follow https://github.com/whatwg/html/pull/8972. See also the guides at https://w3ctag.github.io/bfcache-guide/.