Handling race conditions - API for accessing pending requests?

wheresrhys commented 8 years ago

There's a couple of race conditions arising in the SW I'm working on relating to prefetch/preload:

Once a user has navigated to a page I send off a request to prefetch the next page, this comes back with a load of link headers for css & js resources, which I also prefetch. There is a chance that the user will navigate to this next page before all those requests have responded. How can I force the new request (and all its related preload requests for the links) to use the requests that are already in flight?
I have a css file containing my critical path css. On the first page load this is inlined, but a link=preload header also tells the browser to fetch it as a standalone file so that when the sw kicks in it can safely request the page without inlined css. But in the SW I don't know if the browser supports preload so, to be on the safe side, I may want to manually request all the files in the link headers. For browsers that do support preload this will mean unnecessary doubling of requests.

It'd be useful to have a pending requests API, with similar request matching rules to the cache API. Could it even be built into the Cache API? e.g.

cache.open('mycache')
  .then(() => {
    cache.set(request, fetch(request).then(res => {
        // optionally either throw to reject, or return res to put in cache
    })
  }

If the promise rejects, or resolves with anything other than a response then the response would not be put in the cache. cache.get(request) would resolve/reject when the promise resolves/rejects

asutherland commented 8 years ago

I think the answer in both cases is that the HTTP cache will help avoid actually issuing double requests over the network and that this simplifies things for everyone.

Cache API-wise, it's my understanding from discussion about the Cache transaction API https://github.com/slightlyoff/ServiceWorker/issues/823 that the Cache API will be changed so that an in-flight put/add will only become visible for match() purposes once its body has been retrieved and it is atomically inserted into the Cache. And that the existing language around fetching records and incumbent records will be simplified to reflect this. In other words, an explicit decision to not expose a pending requests API.

NekR commented 8 years ago

I guess author meant that some request, say to styles.css might be in-flight by link=preload. And, right after that normal request could be done to the same URL.

The question is: how to re-use existing network request?

I guess mentioning of Cache API just confused all the things.

jakearchibald commented 8 years ago

What's the benefit of doing this manually rather than <link rel="prefetch"> or <link rel="prerender">?

NekR commented 8 years ago

I guess something simple like this could be used:

self.addEventListener('fetch', event => {
  const request = self.requests.match(event.request).then(inflight => {
    return inflight || fetch(event.request);
  });

  event.respondWith(request);
});

This also can potentially solve the issue in https://github.com/slightlyoff/ServiceWorker/issues/920

jakearchibald commented 8 years ago

@NekR hm, I'm not so sure. #920 wants a preflight request along with the fetch event. The problem with getting this from some kind of async store, is if you don't fetch it before the request is complete, you missed your chance.

NekR commented 8 years ago

@jakearchibald I see what you mean. For the case in #920 though, the SW is spawned exactly to the navigation request, so knowing this, browser may not remove that pre-flight request from requests store until SW is destroyed.

For the case of link=preload, yeah, this way requests could be potentially missed, but since SW already controls the scope, it could be handled this way:

(cache matching added, assuming SW cached request and if it's not inflight -- try to get from cache)

self.addEventListener('fetch', event => {
  const request = self.requests.match(event.request).then(inflight => {
    return inflight || self.caches.match(event.request) || fetch(event.request);
  });

  event.respondWith(request);
});

Also there could be other solution for link=preload right now:

let stylesRequest;

self.addEventListener('fetch', event => {
  const url = new URL(event.request.url);

  if (url.path === '/styles.css') {
    let request;

    if (!stylesRequest) {
      stylesRequest = fetch(event.request).then((res) => {
        stylesRequest = null;
        return res;
      });
    }

    event.respondWith(stylesRequest);
  }
});

But this is reliance on a global state which is obviously bad and shouldn't be used, especially we SW team decide to on multiple SW instances.

Just trying to generate some ideas.

P.S. I think in the case with stylesRequest it should fail though because response could be used only once, but I'm not sure.

wheresrhys commented 8 years ago

I think the answer in both cases is that the HTTP cache will help avoid actually issuing double requests over the network and that this simplifies things for everyone.

Would this necessarily be true if the first request hasn't completed yet? At what stage does an object get put in the http cache?

What's the benefit of doing this manually rather than <link rel="prefetch"> or <link rel="prerender">?

I didn't realise they were more widely supported than preload, and they could be a good option (although prerender is still missing in firefox, and who knows in which order safari might implement sw & pre* link headers).

If I fetch /page1.html and it has a link header </page2.html>; rel=prefetch, and page2.html has a link header </main.css>; rel=preload, will main.css get downloaded?

If not, it'd be useful to have something between the two - prefetch-with-linked-resources. Prerender is a bit extreme sometimes e.g. one potential use case is to prefetch all stories featured on the home page - I will want to fetch all the html files and their styles and scripts, but won't want to expend resources on background render for each of them.

One other reason to not simply use link headers is that some of these prefetches may be based on user recommendations/interactions on the page, so information about them won't be carried in the page's link headers e.g. a user clicks on a 'make this article available offline button', and then navigates to the page fairly quickly.

asutherland commented 8 years ago

Would this necessarily be true if the first request hasn't completed yet? At what stage does an object get put in the http cache?

Yes. Speaking at least for Firefox/Gecko, before thinking about talking to the network, the HTTP channel consults the cache at the same time it creates the entry. Requests will be coalesced as long as the fetch/request is allowed to be served from cache. (For example, a fetch cache mode of "no-store" would not be allowed to be served from cache and so could not be coalesced.)

Source-delving, if you look at Gecko's cache2 CacheStorageService::AddStorageEntry, that code looks for an existing cache entry and creates a new handle to it if an entry already exists. Otherwise, it creates the entry. (And this is called by CacheStorage::AsyncOpenURI which is used by nsHttpChannel::OpenCacheEntry, which you can follow back to its callers, etc.)

NekR commented 8 years ago

Some other brainstorming on self.requests API (possible bugs in code):

self.addEventListener('fetch', event => {
  const request = event.request;
  const url = new URL(request.url);

  if (url.pathname === '/') {
    const result = self.caches.match(request).then(res => {
      if (!res) {
        return loadIndex(request);
      }

      return res;
    });

    event.respondWith(result);
    return;
  }

  const result = self.requests.match(request).then(inflight => {
    return inflight || self.caches.match(request);
  }).then(res => {
    if (res) return res;

    return fetch(request).then(res => {
      if (res && res.ok) {
        putCache(request, res);
      }

      return res;
    });
  });

  event.respondWith(result);
});

function loadIndex(request) {
  // Preload
  [
    '/main.js',
    '/main.css',
    '/logo.png'
  ].forEach(asset => {
    const req = new Request(asset);
    const fetching = fetch(req).then(res => {
      if (res && res.ok) {
        return putCache.then(() => res);
      }

      return res;
    });

    // Puts request / request + response into memory store
    // until `fetching` is settled
    self.requests.putUntil(req, fetching);
  });

  return fetch(request).then(res => {
    if (res && res.ok) {
      putCache(request, res);
    }

    return res;
  });
}

function putCache(req, res) {
  return caches.open('cache').then(cache => {
    return cache.put(req, res);
  });
}

Here main page's assets are requested along side with it and are picked inside fetch event via self.requests API if requests are inflight or via self.caches if they are already cached (UPD: this works for both preloads via link or normal requests from the page).

self.request.putUntil() puts a requests to the store until request finishes. There possible could be other methods or options to put request forever (until all clients of the scope are closed) for given time.

For the case with pre-flight navigate request and PlzNavigate, the request could be passed there until fetch event for the navigate is responded.

delapuente commented 8 years ago

IMHO, it should be transparent for the dev to handle a second request to the same resource. An ideal network will respond with no lag and your problem would vanish but this is precisely what HTTP cache achieves.

w3c / ServiceWorker

Handling race conditions - API for accessing pending requests? #959