kornelski / http-cache-semantics

RFC 7234 in JavaScript. Parses HTTP headers to correctly compute cacheability of responses, even in complex cases
http://httpwg.org/specs/rfc7234.html
BSD 2-Clause "Simplified" License
244 stars 27 forks source link

Better support stale-while-revalidate option #43

Open Embraser01 opened 10 months ago

Embraser01 commented 10 months ago

Currently, there was an issue with the stale-while-revalidate options. The satisfiesWithoutRevalidation method didn't allow stale response to be returned when swr was set.

I changed the logic so that the stale response is returned in that case. I also added tests on this and on some edge case with the max-stale request option (I'm not 100% sure it the right way but didn't find any specs on this).

Related issues: #41, #27

kornelski commented 10 months ago

Thanks for the PR. This functionality is quite desirable.

However, my concern is how to communicate to library users that they still need to do the revalidation. If someone simply wrote code like recommended so far:

if (cached.satisfiesWithoutRevalidation(req)) {
    return recycle(cached);
    // gone! no request made
}

then they won't know that they still need to perform revalidation in the background. I wouldn't want them to have to check headers themselves separately.

Any ideas how the interface could be changed to be more robust in this case?

I'm thinking that perhaps satisfiesWithoutRevalidation could continue to return false, and users would use some other method in the "cache miss" code path to check if they can return stale in the meantime. I'd rather require caches supporting stale-while-revalidate to have extra code, than to have a pitfall of existing users not revalidating when needed.

Embraser01 commented 10 months ago

I agree that this is not the best user interface. I tried to limit the impact of my changes so that it could be usable (for now it isn't really usable). It would be nice for the user to know if the response given by satisfiesWithoutRevalidation is staled or fresh (given that max-stale can also return stale data).

I think having a boolean as a return value prevent complexity and is simple to use and understand however, there is some cases (swr, max-stale) where this boolean doesn't give enough info on what to do. For now with max stale, it's easy to know if the res is staled content or not with the canStale() function (if it satisfies and can stale => staled content), but it isn't straight forward and doesn't work with stale while revalidate.

I'm not sure on what interface would work better, maybe instead of returning a boolean, it should return an enum or maybe something more flexible like an object like:

const { satisfied, revalidate } = cached.satisfiesWithoutRevalidation(req);

Where satisfied is true as long as the cached content can be returned ASAP revalidate is true if cache is expired and should be refreshed

Most cases would be satisfied(true), revalidate(false) or satisfied(false); revalidate(true). When using stale-while-revalidate you could have satisfied(true), revalidate(true)

That could be a new method so that would break anything

kornelski commented 10 months ago

Perhaps a small change could be to keep satisfiesWithoutRevalidation return false for s-w-r, and repurpose useStaleWhileRevalidate(req) to be equivalent of satisfiesWithoutRevalidation that supports s-w-r.

I also maintain a Rust version of the same package, and I'll have to add it there too. Thanks to Rust's support for enums the API ended up looking completely different:

match policy.before_request(req) {
     Fresh(response) => return response,
     Stale(request) => return policy.after_response(send(request)),
     // SWF(request, response) => { update_cache(request)); return response }
}

So maybe in JS this could be generalized to something like this:

let { request, response } = policy.evaluate(request, response);
if (request) { update_cache(request) }
if (response) { return response } else { wait for the cache here…? }
Embraser01 commented 10 months ago

Perhaps a small change could be to keep satisfiesWithoutRevalidation return false for s-w-r, and repurpose useStaleWhileRevalidate(req) to be equivalent of satisfiesWithoutRevalidation that supports s-w-r.

I think it would be better to keep a single API for all the cases. To not create a breaking change, it can be a new recommended API that would handle the swr case and maybe other cases like in your evaluate solution.

So maybe in JS this could be generalized to something like this:

let { request, response } = policy.evaluate(request, response);
if (request) { update_cache(request) }
if (response) { return response } else { wait for the cache here…? }

I'm a bit confused on the request & response naming, it doesn't give any semantics information. I think the evaluate function could return (naming TBD):

That system would allow users to only implement simple case (no swr)

const { revalidate, asyncRevalidate, cacheHit } = policy.evaluate(req, res);

if (revalidate) {
    const newRes = await update_cache(req);
    return newRes;
}
if (cacheHit) {
    res.headers = policy.responseHeaders(res);
    return res;
}

but with easy support of swr

const { revalidate, asyncRevalidate, cacheHit } = policy.evaluate(req, res);

if (asyncRevalidate) {
    update_cache(req).catch(err => logError(err));
} else if (revalidate) {
    const newRes = await update_cache(req);
    return newRes;
}
if (cacheHit) {
    res.headers = policy.responseHeaders(res); // Could directly come from evaluate
    return res;
}
kornelski commented 10 months ago

I've jumped a few steps ahead in my explanation.

The updated response headers (only if applicable and not too computing intensive 🤔)

Update of headers is not optional. You're simply not allowed to skip that step, even if it took whole day to compute (but it doesn't). Therefore, I want to change the API to always give you the updated headers.

Similarly with response revalidation. If the cached version is unusable, you'll have to make a request. If it's s-w-r, you have to make the request anyway. So I want to provide up-to-date request headers for fetch/revalidate whenever it's needed.

So if you have something to return (it's fresh or s-w-r), you'll get response headers to use. If you have to make a request to the origin (because it's a miss or s-w-r), you'll get a set of headers for making that request. The observation here is that it doesn't really matter if it's a request due to cache miss or a revalidation request, because you're going to send it to the origin anyway, and it'll go back to the cache to update it.

So you don't really need to know the details of what happened. You only have two possible actions: return a response or make an upstream request. S-w-r is a case where you do both, but doesn't change what inputs you need — it's still a response to return and a request to make. That's why I suggested { request, response } as a returned object, but I take this may be unclear naming.

How about this:

Embraser01 commented 10 months ago

Thanks for the explanation, much clearer now! I like the new naming, but after trying to write some of this to test a bit how that would work in real life, I found that it wasn't really easy to use and still let some tricky thing up to the user.

I tried again but with some inspiration from your rust example. I think I found something that would solve this:

The idea would be to have the evaluate function look like this:


type EvaluateRes<T> = {
  cacheResponse: Response;
  asyncRevalidationPromise?: Promise<T>;
};

async evaluate<T>(
  req: Request,
  res: Response,
  doRequest: (req: Request) => Promise<T>,
): Promise<EvaluateRes<T>> {
  // If cache is fresh
  //    return updated response
  // If cache is stale and stale-while-revalidate is valid
  //    call doRequest(req).catch(logSomewhere)
  //    return updated response
  // Else
  //    call await doRequest(req);
  //    send updated response

  return {} as EvaluateRes<T>;
}

And then, use it like this:

async function doRequest<T>(request: Request): Promise<T> {
  const response = await callHTTPEndpoint(request);

  const policy = new CachePolicy(request, response, {});
  if (policy.storable()) {
    await cache.save(request, { policy, response }, policy.timeToLive());
  }
  return response;
}

export async function test() {
  // And later, when you receive a new request:
  const newReq: Request = { headers: {} };

  const cacheRes = await cache.get(newReq);
  if (!cacheRes) {
    return await doRequest(newReq);
  }
  const { policy, response } = cacheRes;

  const { cacheResponse, asyncRevalidationPromise } = await policy.evaluate(
    newReq,
    response,
    doRequest,
  );

  // (optional) Let the user have some control on stale while revalidate
  if (asyncRevalidationPromise) {
    asyncRevalidationPromise.catch(() =>
      console.error('SWR request was not successful'),
    );
  }
  return cacheResponse;
}

This system remove the need for the user to know when to do the request and if they should await or not the response. It always gives back an updated response.

Embraser01 commented 10 months ago

@kornelski friendly ping on this discussion 🙂

kornelski commented 10 months ago

I'm not a fan of inversion of control approach. While it does make it a bit more foolproof, it also makes the API more opaque, and I'm worried it could be too restrictive for some advanced uses.

I think there can be two ways forward:

  1. Keep satisfiesWithoutRevalidation return false for s-w-r, and repurpose useStaleWhileRevalidate(req) to be equivalent of satisfiesWithoutRevalidation that supports s-w-r.

  2. Add beforeRequest and afterResponse like this: https://docs.rs/http-cache-semantics/latest/http_cache_semantics/struct.CachePolicy.html#method.before_request https://docs.rs/http-cache-semantics/latest/http_cache_semantics/struct.CachePolicy.html#method.after_response mapped to something like {freshResponse|staleResponse|originRequest} mentioned in my previous comment.