Allow workers & shared workers to be created within a service worker

ian97531 commented 1 year ago

Update (2022/10/10): This issue was originally making the case for workers and shared workers to be creatable by service workers to enable a chrome extension use case. Since extensions are not part of the web platform, I'm updating the motivating use case here to reflect one pointed out by @wanderview that is relevant to the web platform:

Custom (de)compression algorithms that are cpu intensive can use dedicated workers to avoid introducing delays in fetch event processing.

Also, any site that performs sync with the server and must support multiple tabs has a use for shared workers for managing state. Often that state will need to be accessed from the service worker as well.

Original issue description: As Chrome extensions migrate to manifest v3, their background scripts become service workers. Since extensions cannot reliably spawn web workers from content scripts (due to potential CSP restrictions in the host page), it'd be very useful to be able to spawn web workers from the chrome extension's service worker to execute some WASM.

This was already discussed in 2 other (closed) issues here and here. I figured I'd open a new issue as requested by @annevk (here) since there's now a new use case created by the move to manifest v3.

asutherland commented 1 year ago

To the motivating use-case, at least in Firefox, content scripts operate in their own distinct global[1] which could be provided with its own Worker constructor which need not be subject to page CSP rather than having the spec complexity of the content script using the underlying content page's global.

1: Which does have access to the underlying content page's global and may do some prototype chain tricks for convenience / compat.

paralin commented 1 year ago

Related: spawning Worker, SharedWorker, ServiceWorker from within a SharedWorker.

domenic commented 1 year ago

Extensions are not the web platform, and in particular extension service workers are not web platform service workers. If extension service workers want nested workers, each browser can add them; that doesn't impact the web platform workers specified in web standards.

wanderview commented 1 year ago

There are use cases for dedicated and shared workers in web platform service workers.

Custom (de)compression algorithms that are cpu intensive can use dedicated workers to avoid introducing delays in fetch event processing.

Also, any site that performs sync with the server and must support multiple tabs has a use for shared workers for managing state. Often that state will need to be accessed from the service worker as well.

ian97531 commented 1 year ago

Extensions are not the web platform, and in particular extension service workers are not web platform service workers. If extension service workers want nested workers, each browser can add them; that doesn't impact the web platform workers specified in web standards.

Hi @domenic. Does that mean that if nested workers were implemented in the web platform and adopted by browser vendors, extension service workers still wouldn't be able to use nested workers? Thanks for the reply!

domenic commented 1 year ago

Maybe! They're separate spaces. Any of the four outcomes in the matrix { extension service workers support nested workers, extension service workers don't support nested workers } x { web platform service workers support nested workers, web platform service workers don't support nested workers } are possible.

In practice, code sharing means there's some influence. E.g., if the web platform community finds @wanderview's use cases compelling, and implement it for the web platform, then maybe some engineers at some browser would do the extra work to expose the feature to extensions. But it would be extra work, not automatic, and would need its own per-browser approvals and discussions.

My main point is, if you believe extension service workers should support nested workers, then advocating for web platform service workers getting nested workers is not a good tactic. Us web platform engineers are not motivated by adding things to extensions, because extensions are not part of the web platform. Instead, you should work with the engineers designing and shipping extension APIs, which is a very different group from the engineers designing and shipping web platform APIs. (And I don't know where those extension engineers hang out.)

annevk commented 1 year ago

https://github.com/w3c/webextensions seems to be somewhat active at least.

ian97531 commented 1 year ago

@annevk @domenic Thanks for the explanation. I'll file something over in the w3c/webextensions issues. In the meantime, I'll leave this issue open and update the description to include @wanderview's use cases since they seem valid for the web platform.

asutherland commented 1 year ago

@wanderview's use cases sound reasonable for the web platform. I think the main question is whether the unique lifetime of ServiceWorkers means that we should consider options like:

Specializing the SharedWorker lifetime semantics to allow the SharedWorker to treat pages controlled by the ServiceWorker as members of the owner set so that the SharedWorker can remain alive even if the ServiceWorker shuts down due to idleness or because the browser wants to migrate it between processes or reload it for some reason or to enable the ServiceWorker to upgrade without shutting down the SharedWorker.
- Since it's already possible for every page to establish a connection to the relevant SharedWorker to accomplish the lifetime goals here, the argument here would primarily be to support sites establishing a more clear abstraction boundary between the page and the ServiceWorker. This could enable pages to speak a pure REST API to their ServiceWorker and potentially simplify the ServiceWorker upgrade situation by allowing the ServiceWorker to have more control over the lifetime of the SharedWorker.
- This also would potentially simplify interception related edge-cases for script-loading.
Only supporting SharedWorkers in order to avoid sites going out of their way to keep the ServiceWorker alive to keep the dedicated workers alive.

As somewhat referenced above, we would also need to determine the rules for interception for dedicated workers and SharedWorkers. Would they use the exact same script loading mechanism as the ServiceWorker itself and need to be offlined in the "install" phase? Would they always be intercepted by the ServiceWorker that creates them so that they work offline but can be upgraded without upgrading the root ServiceWorker script? Would they just be intercepted based on scope but with a hard requirement that they be intercepted by some ServiceWorker so that the fundamental goal of supporting offline operation is still there? It might be appropriate for some of this discussion to happen in https://github.com/w3c/ServiceWorker/issues/1529 instead.

frank-dspeed commented 1 year ago

i do not even think that all that is need i am at present doing the following i write it in short pseudo code

// Code inside the html

const register = ServiceWorkerCreate()
register.listenOn("message")
register.postMessage()

inside the worker

self.listenOn("message")
self.postMessage()

onMessage((msg)=>putIntoCache(new Response(msg.response))

implement some handler functions to interact with the cache. i in fact never deploy new service workers once they are activated they are self managing without using the version rolling features. as it is more easy for me to use other database like sync methods to manage the cache content i only need the postMessage functionality between the window and the service worker

when the service worker cache updates need execution inside shared workers or wasm that works out of the box as they use shared memory it is fast to pass messages between both to update the cache.

extra bonus cross context shared service workers.

If you throw webrtc data channels into the mix you get cross context share able service workers and web workers

its a core concept of my web gui interop methods

a-sully commented 1 year ago

To chime in another use case here...

The SyncAccessHandle API is a fast, POSIX-like file primitive that (among other things) allows for database engines to be ported to the web via Wasm and perform at near-native speeds.

However, since the API is synchronous, it's only available from Dedicated Workers. Not being able to use the SyncAccessHandle API from a Service Worker (if it can't create a Dedicated Worker) is a potential deal-breaker for apps which would otherwise migrate to this API from IndexedDB

Bodhizafa commented 1 year ago

I too have been tripped up by the fact that there's no real way to get sync OPFS access from serviceworkers.

frank-dspeed commented 1 year ago

@a-sully @Bodhizafa i extensive use opfs on my awesome os project the right way to do it is to create a sharedWorker not a service worker not a independent worker as there can be only one shared worker per origin url then you use BroadcastChannel inside that worker and in all other places to talk to it that gives you a single sync api hope that helps

Bodhizafa commented 1 year ago

@frank-dspeed A SharedWorker is not able to be used to service requests in an offline-first progressive web app. A ServiceWorker is required for that, it is a fundamental context. I cannot create a SharedWorker without a page context, and when a ServiceWorker is running in the background, a page context does not exist.

js-choi commented 11 months ago

Adding onto @a-sully’s and @Bodhizafa’s messages, a major use of allowing service workers to create workers could be allowing SWs to create workers that use OPFS.

This is a particularly important use case for offline-first web applications, particularly offline-first applications that support multiple tabs/windows (e.g., a notes app, photo-library web app, or word-processor app that allows multiple tabs/windows that read/write to the same local files).

For example, there are implementations of SQLite, complied into WebAssembly, that use OPFS (e.g., SQLite’s new official JavaScript API, whose use of OPFS is based on @rhashimoto’s WASM SQLite. However, SQLite is limited by service workers’ inability to use FileSystemSyncAccessHandle and access OPFS—and their inability to spawn a worker that can use FileSystemSyncAccessHandle.

In order to allow multiple same-origin tabs/windows to access the same SQLite database on OPFS, @rhashimoto has been exploring promoting the origin’s first tab to be a SQLite “provider” (rhashimoto/wa-sqlite#81 and rhashimoto/wa-sqlite#95). All subsequent same-origin tabs send messages to that first “provider” tab, via MessagePorts that are mediated by a service worker (or shared worker). Web Locks are used to watch the “provider” tab’s lifetime and, when the user closes that tab, to promote another tab to be the SQLite “provider”. A similar leader-election solution could theoretically be done for web apps that store document files on OPFS, without SQLite.

This impressive workaround would be unnecessary if the origin’s service worker could simply spawn a worker, which in turn could access OPFS. Instead of promoting and managing a special “provider” tab with Web Locks and MessagePorts, all tabs of the web app could simply make requests to the service worker, which in turn would use a worker to access OPFS. The tabs could use the same HTTP requests that they would with the online server, with the service worker mediating whether to use OPFS or the online server.

Basically, allowing service workers to spawn workers is probably pretty important for offline-first web applications (particularly those that support multiple tabs and which save to the same offline data store, e.g., SQLite, on OPFS). Workers-from-service-workers aren’t just for browser extensions.

Edit: This article proposing a “ServerFree architecture” is a good illustration of how it can be useful to combine a service worker with another worker running SQLite on OPFS.

rhashimoto commented 11 months ago

Instead of promoting and managing a special “provider” tab with Web Locks and MessagePorts, all tabs of the web app could simply make requests to the service worker, which in turn would use a worker to access OPFS.

There is another proposal, allowing multiple readers/writers with FileSystemSyncAccessHandle, that will address problems with multiple tabs accessing the same SQLite database in OPFS. In most cases I expect that would be a better solution than this proposal, as it should allow using the SQLite sharing mechanisms (possibly including write-ahead logging) instead of arbitrating access at the application level.

For this specific use case (not including when a service worker needs an OPFS database), I think the multiple readers/writers proposal is the one you really want.

js-choi commented 11 months ago

@rhashimoto: Thanks for chiming in, and thank you for your work on SQLite on OPFS with FileSystemSyncAccessHandle.

I agree that allowing multiple readers/writers would also be useful for multiple tabs using the same file store.

The advantages of mediating data reading/writing through a service worker instead are:

The tabs could use the same HTTP API to read/write data in cloud storage and to read/write data in OPFS. When the service worker acts as a “transparent” mediator between cloud storage and with local caching, tab scripts do not have to be concerned with whether documents are stored offline only, online only, or both.
Webpages that are loaded from URLs typed into the address bar can be dynamically generated offline by the service workers.

When multiple tabs directly read/write to both OPFS and the online server, the tabs must use two different APIs for local and cloud data, and each tab must individually reconcile local data and cloud data. They also cannot dynamically generate web pages when URLs are directly entered into the browser address bar.

This is analogous to how service workers can use Cache or IndexedDB, without tabs being concerned about whether responses are served from the Cache/IndexedDB or from the online server. SQLite (and FileSystemSyncAccessHandle with OPFS in general) have the potential to serve as a more “advanced” form of Cache, for dynamically generating web pages in service workers. For example, a cloud document editor, wiki, CMS-based blog, or Pokédex web app that wished to make itself available offline could dump all of its documents’, wiki’s, or CMS’s data into a SQLite database, then use its service worker to dynamically generate the same web pages offline. (The service worker also may conveniently mediate sync of modified local data with cloud storage: a single point of sync instead of multiple points of sync from multiple tabs.) People are already using service workers with IndexedDB for these reasons, and OPFS with FileSystemSyncAccessHandle (with or without SQLite) would also be useful for the same reasons. Cheers!

js-choi commented 8 months ago

Another use case for service workers spawning workers, in addition to SQLite WASM and/or OPFS FileSystemSyncAccessHandle, is fetch-transparent data encoding/decoding:

A web application may wish to fetch data in new formats that are not natively supported by the client. By using a service worker, when the application’s tabs fetch data, the service worker can intercept the tabs’ fetches and retrieve then decode data in the new format—all transparently to the tabs.

libjxl has a WASM demo that does this for JPEG XL. By including its service-worker script, a web application can intercept its tabs’ fetches for images, request image/jxl versions of the images from the servers instead, decode any received image/jxl data, and pass the decoded results to the fetches’ responses.

Ideally, the service worker would be able to perform the decoding process in a worker, to prevent blocking the service worker from other tasks (a tab→SW→worker→SW→tab dataflow path). However, because the service worker cannot currently spawn workers, it must use a convoluted tab→SW→tab→worker→tab→SW→tab dataflow path. Quoting the libjxl WASM demo’s readme:

Fetch API receives a resource request from client page (e.g. when the HTML engine discovers an img tag) and asks the ServiceWorker how to proceed

the ServiceWorker alters the request and uses the Fetch API to obtain data

when data arrives, the ServiceWorker forwards it to the "client" (the page) that initiated the resource request

the client forwards the data to a worker (see client_worker.js) to avoid processing in the "main loop" thread

a worker does the actual decoding; to make it faster several additional workers are spawned (to enable multi-threading in WASM module); the decoded image is wrapped in non-compressed PNG format and sent back to client

the client relays image data to ServiceWorker

the ServiceWorker passes data to Fetch API as a response to initial resource request

This convoluted dataflow is reminiscent of rhashimoto/wa-sqlite#81 and rhashimoto/wa-sqlite#95 for SQLite WASM. Both libjxl’s use case and SQLite’s use case would be much simplified if service workers could simply spawn workers, instead of relying on tabs’ workers.

DenizUgur commented 7 months ago

If I may add to @js-choi's great explanation, passing a reference of a Worker to ServiceWorker would simplify this process as well. Not sure if that's possible though.

janfjohannes commented 1 month ago

If I may add to @js-choi's great explanation, passing a reference of a Worker to ServiceWorker would simplify this process as well. Not sure if that's possible though.

service workers can run without parent tab eg in background sync or when handling notification events, so we need more than just being able to receive a reference to another worker.

whatwg / html

Allow workers & shared workers to be created within a service worker #8362

extra bonus cross context shared service workers.