New "setup" lifecycle for service worker

hanguokai commented 3 years ago

Summary: Current, service worker only has install and active lifecycle. I suggest adding a setup lifecycle event, like setup() method in unit testing. When SW wake up by events, it first waits for the setup method complete, then start to process events. setup method only execute once when SW wake up.

Maybe you know service worker now used to process Chrome extension's background event, e.g. browser tabs updates.

var initWaitList = [];
var start = false;
var cache;
function init() {
  return new Promise(async function(resolve, reject) {
    if(cache) {
      resolve();
    } else {
      initWaitList.push(resolve);
      if(!start) {
        start = true;
        cache = await readDataFromStorage();
        for(let r of initWaitList) {
          r();
        }
        initWaitList = [];
      }
    }
  });
}

chrome.tabs.onUpdated.addListener(async function(parameters) {
  await init();
  // then use cache data process events
});

What I understand is that service worker usually keep alive for a while(e.g. 10 seconds) when no task, so use global variable as cache can improve performance, because maybe there will be more events triggered. Cache is the same life cycle with service worker. In above example code, it is a data that async read from storage(IDB or chrome.storage). In my environment(chrome extension), service worker will wake up by multiple events in very short time. The above code(init method) is used to prevent from reading data from storage multiple times, reading multiple times is not necessary.

If service worker support a setup stage before process events, developers don't need to consider concurrent situation before it start to process events.

wanderview commented 3 years ago

I'm not sure I understand. Can't your snippet above be used today without the browser supporting an additional "setup" phase on worker launch? The top level evaluation of the script is in many ways this "setup" phase. Its just synchronous and not async today.

hanguokai commented 3 years ago

Above snippet is current implementation without "setup" phase.

If service worker support "setup" phase, the snippet will look like below, very simple:

var cache;

self.addEventListener('setup', async event => {
  cache = await readDataFromStorage();
});

chrome.tabs.onUpdated.addListener(function(parameters) {
  // use cache data process events
});

wanderview commented 3 years ago

That is more concise, but doesn't seem that much better than:

var cache;
var init_promise = async function() {
  cache = await readDataFromStorage();
}();

self.addEventListener('fetch', evt =>{
  evt.respondWith(async function() {
    await init_promise;
    // use cache data
  }());
});

wanderview commented 3 years ago

To be clear, there is an expensive cost to adding more phases to service worker startup in terms of both c++ development time, spec writing, and also runtime performance. For example, this phase could delay processing events, etc. I think we would need a fairly large benefit or problem solved to take on something like this. I hope that makes sense.

asutherland commented 3 years ago

It's also worth calling out that WebExtensions can layer whatever additional semantics and state machines it wants onto its own API surfaces. There's nothing stopping WebExtensions from, specified in its own standard, dispatching a functional event "webextSetup" which must complete before WebExtensions starts dispatching events at the tabs API.

I think the major issue is the "fetch" event, and at least in terms of Firefox's planned implementation of WebExt Service Workers, that won't be going through the normal fetch specification and "handle fetch" series of hooks. Instead WebExtensions will directly dispatch a fetch event at the ServiceWorker. The WebExtensions network layer ("channel" in Firefox/Necko) has complete control over when it dispatches this event and can internally wait for "webextSetup" or whatever it wants to happen before handing them off to the ServiceWorkers machinery.

Obviously, all of this could change with any standardization effort for WebExtensions, but @wanderview's complexity concerns would still hold in that case. (And in general I do expect there to be some semantic impedance mismatches resulting from WebExtensions' creation and evolution to use ServiceWorkers outside of any public standards efforts.)

hanguokai commented 3 years ago

My initial problem is that if multiple events come at the same time, await readDataFromStorage() will be called multiple times if I don't use my initial snippet in this post.

Your solution, in that condition, the init_promise will be awaited multiple times. await a resolved promise is Ok, although not common.

Use your method, my code will look like:

var cache;
var init_promise = async function() {
  cache = await readDataFromStorage();
}();

chrome.tabs.onUpdated.addListener(function(parameters) {
  await init_promise();
  // use cache data process events
});

Your solution is more simple than my initial solution. But if service worker support 'setup' phase, no need to use an init_promise, and everything's ready, that is more easy to understand.

I think 'setup phase' like 'top level await' in service worker, they do the same thing. For example, the code can like below

var cache = await readDataFromStorage();
// then listen events

So 'setup phase' vs 'top level await', which is better?

hanguokai commented 3 years ago

@asutherland For Chrome extensions, there are many kinds of events, not just one fetch event. And you usually need to read data from storage(e.g. user settings) to process the event.

asutherland commented 3 years ago

@hanguokai Understood and agreed.

My general point here is that there's no need for the ServiceWorkers spec to add explicit waits to its event dispatching semantics along the lines of Step 19 in Handle Fetch that says If activeWorker’s state is "activating", wait for activeWorker’s state to become "activated". That is something that can entirely be dealt with by the WebExtensions consumption of ServiceWorkers in WebExtensions space if that's something WebExtensions want.

I understand the case you are making for use case commonality for (web) Service Workers, but I share @wanderview's complexity concerns and a general performance concern, as the general feedback we have received is that ServiceWorkers need to have the lowest latency possible, and introducing an intentional pipeline stall that precludes parallelism seems counter to that (and the UX that users want, as opposed to the DX that developers want). WebExtensions operate in a somewhat different problem space where there's inherent user intent via the act of installation that justifies speculatively spinning up ServiceWorkers in response to potential actions like moving towards/hovering on a button and/or starting them at browser startup and keeping them alive for extended periods of time that would not make sense for websites.

I mentioned "fetch" because it was my impression that WebExtensions re-uses "fetch" rather than having introduced a distinct event type with its own semantics. I may be wrong about this.

hanguokai commented 3 years ago

No matter if service worker provides 'setup phase' or 'top level await', thank @wanderview for your simpler solution.

Here I post a more complete code snippet, which include updating the cache. Maybe other extension developers will refer to this solution.

// I use uppercase to identify global variables
var Cache;
function init() {
  return new Promise(function(resolve, reject) {
    chrome.storage.sync.get(null, function(data) {
      // setup cache, for example:
      Cache = data;
      resolve();
    });
  });
}
var InitPromise = init();

// listen browser events, for example:
chrome.tabs.onUpdated.addListener(async function(parameters) {
  await InitPromise;
  // now you can use cache to process the event
});

// if data changed, e.g. user changed settings, you should update cache.
// listen messages or listen storage change event. for example:
chrome.runtime.onMessage.addListener(function(request, sender, sendResponse) {
  if(request.type == "update") {
    InitPromise = init();
  }
});

jakearchibald commented 3 years ago

@hanguokai

await a resolved promise is Ok, although not common.

I think it's pretty common pattern, and it's one of the reasons promises don't expose their state. It's encouraged to "just await the promise", rather than if (resolved) doThing else awaitThingThenDoThing. Eg, the example here https://www.npmjs.com/package/idb#examples, where dbPromise will be already-settled for most of the calls.

jakearchibald commented 3 years ago

@hanguokai I think this pattern is more promise-like:

const dataPromise = new Promise((resolve, reject) => {
  chrome.storage.sync.get(null, (data) => resolve(data));
});

// listen browser events, for example:
chrome.tabs.onUpdated.addListener(async (parameters) => {
  const data = await dataPromise;
});

This also means a given function is working with the same data throughout its execution. Whereas in your model, Cache could change half-way through a function, if init() is called by some other function.

hanguokai commented 3 years ago

const dataPromise = new Promise(...) works only if you don't need to update data. But in my case, if user change data, I need to update cache. So InitPromise would be re-assigned.

jakearchibald commented 3 years ago

Yes, if it needs changing:

let dataPromise;

function updateDataPromise() {
  dataPromise = new Promise((resolve, reject) => {
    chrome.storage.sync.get(null, (data) => resolve(data));
  });
}

updateDataPromise();

// listen browser events, for example:
chrome.tabs.onUpdated.addListener(async (parameters) => {
  const data = await dataPromise;
});

hanguokai commented 3 years ago

These codes are equivalent, just the style is different. If you wait for data, then resolve(data). If you wait for the preparation to be completed, then resolve().

hanguokai commented 3 years ago

await a resolved promise is pretty common pattern

Perhaps you're right. This is simpler than maintaining a waiting queue (like my initial solution).

hanguokai commented 3 years ago

These codes are equivalent, just the style is different. If you wait for data, then resolve(data). If you wait for the preparation to be completed, then resolve().

Let me explain more. In my real code, I parse the whole data, and set two global variables. So I think of the init method as a setup phrase, not a single data. If it is a single data, your code is more reasonable.

jakearchibald commented 3 years ago

In your example the init method has side effects that might land half-way through a function, which could cause unexpected bugs. If that's intended, I'd refactor a bit to make it obvious that it's the expected behaviour.

hanguokai commented 3 years ago

Right, data may be inconsistent when updating. It depends on how the data is used. If use cache data synchronously at once after await the promise, no more async process before using it, it won't have a problem. Use old data before update, use updated data after update, this is acceptable as cache. If that code as a general example for others to copy, that may not be appropriate.

jakearchibald commented 3 years ago

In that case I'd have addToCache and getFromCache functions that do the promise juggling so it doesn't need to be done in each event.

tophf commented 3 years ago

When ES modules will be enabled for service workers by default, which can happen pretty soon, it'll be possible to use top-level await to read the storage at the beginning of the script in a "blocking" fashion so it completes before the subsequent code in the script runs.

jakearchibald commented 3 years ago

We don't allow top-level await in a service worker scripts because that's usually an anti-pattern.

tophf commented 3 years ago

We don't allow top-level await in a service worker scripts because that's usually an anti-pattern.

Is it spec-compliant? What will happen when a service worker module statically imports a module that uses a top-level await inside?

jakearchibald commented 3 years ago

It will throw. Top level await in dynamically imported scripts is fine.

jakearchibald commented 3 years ago

See https://github.com/w3c/ServiceWorker/issues/1407

tophf commented 3 years ago

Thanks. Looks like it's still in the process of standardizing. I hope it'll be allowed in an extension service worker because those concerns don't seem to apply to a ManifestV3 browser extension - it doesn't have any blocking events like fetch, moreover an extension is a fully local and privileged thing, explicitly installed by the user to extend the browser not like a hidden service worker installed in drive by mode by a random site.

jakearchibald commented 3 years ago

Is there a spec for extension service worker?

tophf commented 3 years ago

I've never seen it published but maybe it exists privately, try asking someone from chromium. All I know is that it's just a standard service worker (without lifecycle events) and custom code for extension-specific stuff.

tomayac commented 3 years ago

Is there a spec for extension service worker?

Extension service workers are supposed to be just "regular" service workers according to the docs.

jakearchibald commented 3 years ago

But they don't have events that are performance sensitive?

tophf commented 3 years ago

ManifestV3 has removed all synchronous (blocking) events. All events are asynchronous now so there's no difference (performance-wise) where the time will be spent to "set up" the state. Although I don't know if it'll be technically possible for extension API listeners to support the top-level await as currently they must be re-registered in the initial event loop task... Well, we'll see soon.

jakearchibald commented 3 years ago

@tophf huh, so there's no way to intercept and modify things like requests in ManifestV3?

tophf commented 3 years ago

Only via declarativeNetRequest API. Processing is performed inside the browser so the capabilities are quite limited.

jakearchibald commented 3 years ago

Seems like the extension service worker is different enough that it could have its own rules, if it wanted. Or, if it wanted to stick with service worker rules, this format works in both:

const setup = Promise.all([
  asyncThing1(),
  asyncThing2(),
]);

addEventListener('fetch', (event) => {
  event.respondWith((async () => {
    const [thing1, thing2] = await setup;
    // …
  })());
});

The above pattern has advantages: Particular events wouldn't need to wait for setup if they don't need it, or setup requirements could be more granular, where some events don't need to await everything in the setup.

hanguokai commented 3 years ago

Seems like the extension service worker is different enough

Although service worker was originally inspired by the extension event page(background page), the problems they face and the context they run differ greatly.

The conclusion is that use a 'setup promise' implement the 'setup phase' in service worker currently. It seems this trick would be used to resolve 'top level await' problem too.

dotproto commented 3 years ago

Seems like the extension service worker is different enough that it could have its own rules, if it wanted.

In general the Chrome extensions platform aims to be 'the web plus additional capabilities.' I don't think we'd want to do something like change top level await semantics in extension service workers as that could introduce a surprising set incompatibilities between these environments.

While I haven't personally tested it, extension service workers should be able to intercept fetch events for their own origin using standard service worker fetch interception just as you would on the open web.

hanguokai commented 3 years ago

For normal website service worker, it only focuses on three events: self.addEventListener( install | activate | fetch, handler). But they have no meaning for extension service worker. Extensions focus on (lots of) extension events.

w3c / ServiceWorker

New "setup" lifecycle for service worker #1576