nodejs / undici

An HTTP/1.1 client, written from scratch for Node.js
https://nodejs.github.io/undici
MIT License
6.26k stars 545 forks source link

Support all the `cache` options in `fetch()` #3847

Open mcollina opened 3 days ago

mcollina commented 3 days ago

Ref https://developer.mozilla.org/en-US/docs/Web/API/Request/cache.

I think we could add some support for this.

mcollina commented 3 days ago

cc @flakey5 @KhafraDev

KhafraDev commented 3 days ago

+1 to adding it, but it will not work with the current caching interceptor logic. Pasting part of my comment from here:

In the fetch spec, the http cache is built a layer above fetch, whereas in undici it's a layer below [in the interceptor api]. For example, in multiple parts of the spec we are meant to query the cache directly (see http-network-or-cache-fetch). There are multiple situations where headers may get added to requests/responses, or the internal request is modified in some way that will cause bugs or deviate from browser behavior.

mcollina commented 3 days ago

https://fetch.spec.whatwg.org/#concept-request-cache-mode doens't imply it's implemented on top. I'm specifically not talking about CacheStorage.

There are multiple situations where headers may get added to requests/responses, or the internal request is modified in some way that will cause bugs or deviate from browser behavior.

Can you clarify this? Why can't we go update the underlining cache?

KhafraDev commented 3 days ago

doens't imply it's implemented on top. I'm specifically not talking about CacheStorage.

I know, but think about it from this perspective - currently the cache is built into the interceptor itself; it accepts an array of buffers as headers and a node stream for its body. To cache fetch responses, you have to store the headers (a HeadersList), a body (web readable), and the internal response state.

The fetch spec calls implementations to update, insert, and delete responses from the cache in the spec itself. If we leave this to the interceptor, there are a number of places where internal response state may be modified, headers get added, etc. that will change how the response is received.

Can you clarify this? Why can't we go update the underlining cache?

The interceptor cache does not take internal response state into account, and that internal state is also modified between where we are meant to update/insert/delete from the cache and where we are currently doing it (the dispatch). Without taking this into account, things like referer headers, etc. will subtly break). AFAIK the cache is tied to the interceptor api currently, and isn't shared outside of it (and if it can, it still leads to the issue of checking the cache in fetch, and then having to re-check it again in the interceptors).

mcollina commented 3 days ago

To cache fetch responses, you have to store the headers (a HeadersList), a body (web readable), and the internal response state.

This is not how I read the spec in https://fetch.spec.whatwg.org/#concept-request-cache-mode. There is no real connection to HeadersList and Readable, but every object is freshly created.

KhafraDev commented 3 days ago

There is no real connection to HeadersList and Readable

Of course you can convert the HeadersList to and from an array of buffers and web readables to node readables (and vice versa), but what I'm getting at is that fetch has different requirements to cache than the interceptor api.

mcollina commented 2 days ago

...but what I'm getting at is that fetch has different requirements to cache than the interceptor api.

I can only see fetch having additional requirements, not different. If they were different, it would mean that the fetch() spec does not follow RFC 9111, which sound highly unlikely.

metcoder95 commented 2 days ago

Do you have something in mind in specific @KhafraDev? Or what are you proposing to achieve it?

KhafraDev commented 2 days ago

I can only see fetch having additional requirements, not different.

Maybe my phrasing wasn't the best 😄.

Or what are you proposing to achieve it?

We need the following:

metcoder95 commented 1 day ago

an interface to insert, update, and delete cache entries directly (not through an interceptor -- similar to MemoryCacheStore). the issue is that it currently only works for node streams and undici headers (arrays of buffers). currently MemoryCacheStore has no way to update, .get(...) return a new object.

The reusability was one of my concerns when designing the API of the current cache store, but definitely can revisit (maybe create a separate one if we made the work after the cut off for v7).

do we use a shared cache store for fetch and the interceptor?

Good question, I'd imagine will be separated

we need the header parsing logic to be done outside of the interceptor

We can make an issue for that

For the remaining points, definitely a discussion will be worthwhile.

KhafraDev commented 1 day ago

The reusability was one of my concerns when designing the API of the current cache store, but definitely can revisit (maybe create a separate one if we made the work after the cut off for v7).

I think it's 95% fine and just missing a little extra that we need. We can convert fetch Headers to raw headers and web streams to node ones easily, so not much of a concern there. Honestly someone would need to start implementing it to see exactly what's missing/lacking. I don't have the time or motivation to do so currently. :(

An example of one of my bullet points above, does this cache the response? Which takes precedence? Will we allow this? My other points are straightforward IMO so examples aren't needed.


const agent = new Agent().compose(interceptors.cache())

await fetch('...', {
  cache: 'no-store',
  dispatcher: agent
})
metcoder95 commented 19 hours ago

I think it's 95% fine and just missing a little extra that we need. We can convert fetch Headers to raw headers and web streams to node ones easily, so not much of a concern there. Honestly someone would need to start implementing it to see exactly what's missing/lacking. I don't have the time or motivation to do so currently. :(

If you can fill an issue with the requirements we can see who is willing to give it a shot (I can give it a try after the H2 work), and will help to record progress.

An example of one of my bullet points above, does this cache the response? Which takes precedence? Will we allow this? My other points are straightforward IMO so examples aren't needed.

Noup, it won't; but I see where you are going. I'm pretty sure the composed one will take precedence as for fetch the composed dispatcher is just a dispatcher; and we are not able to identify with the current state if the dispatcher is using the cache interceptor or not.

We ether build a mechanism to identify when is a composed dispatcher (and the interceptor in use), or document well what's the possible scenarios they can fall if having both caches set.

mcollina commented 17 hours ago

@metcoder95 I don't think you meant to close this.