nodejs / undici

An HTTP/1.1 client, written from scratch for Node.js
https://nodejs.github.io/undici
MIT License
6.19k stars 540 forks source link

implement CacheStorage? #2071

Closed jimmywarting closed 1 year ago

jimmywarting commented 1 year ago

I was wondering if you would be willing to look into implementing the Cache Storage api

it was something i was looking into in the past... but it got abandoned. it is pretty much tightly coupled with Request/Response classes

and i would like to use it to avoid having to make extra requests if i have the key (Request) cached locally.

it's also in fact tightly coupled with the cache option in Request too. it would be neat if this could just work with some default storage and Request / Responses where saved locally on the disc somewhere.

fetch(url, {
  cache: "no-cache", // *default, no-cache, reload, force-cache, only-if-cached
})

manual working with caches works to...

    caches.match(request).then((response) => {
      // caches.match() always resolves
      // but in case of success response will have value
      if (response !== undefined) {
        return response;
      } else {
        return fetch(event.request)
          .then((response) => {
            // response may be used only once
            // we need to save clone to put one copy in cache
            // and serve second one
            let responseClone = response.clone();

            caches.open("v1").then((cache) => {
              cache.put(event.request, responseClone);
            });
            return response;
          })
          .catch(() => caches.match("/gallery/myLittleVader.jpg"));
      }
    })

(taken from mdn)

maybe caches.open('node:undici') could store the default request / responses from fetch when using the cache option?

ronag commented 1 year ago

I don't have time to do it myself but I'd be happy to see something like this added.

KhafraDev commented 1 year ago

I'll look into it.

KhafraDev commented 1 year ago

I believe we would need sqlite or some alternative to implement this correctly, which makes it a non-starter: "Furthermore, because CacheStorage requires file-system access...". Deno also uses sqlite for this.

jimmywarting commented 1 year ago

Oh, didn't know deno had cache storage, that is cool :) then i think NodeJS should also definitely have it too!

KhafraDev commented 1 year ago

If node was to implement some form of sqlite database this would be actionable, but we can't really do anything better than an in-memory cache (which is worse than a userland alternative).

ronag commented 1 year ago

RocksDB with the BlobDB variant would be a perfect fit IMHO

ronag commented 1 year ago

@mcollina Any thoughts? Would vendoring a db into node core even be an option?

jimmywarting commented 1 year ago

I think it would be good if the nodes http-loader could utilize fetch + CacheStorage. But also for other things such as caching static api responses like some config etc.

KhafraDev commented 1 year ago

I think it would be useful, but better as another library. Nothing requires fetch internals afaict so it should be possible. Adding a postinstall script and a million build dependencies doesn't seem like a good idea for undici.

jimmywarting commented 1 year ago

... but better as another library. Nothing requires fetch internals afaict so it should be possible

What if you want to use: fetch(url, { cache: 'only-if-cached' })? fetch would have to do something...

KhafraDev commented 1 year ago

I was talking about the CacheStorage implementation itself, not really integrating it into fetch, which might need fetch internals. There are a number of fetch options that are either ignored or have no effect, so setting cache won't do anything. It is also possible for another library to wrap undici's fetch and implement the cache option themselves, similar as to what's been done with cookie jars and whatnot.

KhafraDev commented 1 year ago

Here's what I came up with before I got tired of working on it: https://github.com/KhafraDev/node-cachestorage

It implements CacheStorage and Cache. None of Cache is implemented other than matching so you can't really test it out. If node ever got a database of some sort I'd probably copy this implementation into undici so feel free to contribute 😄.

mcollina commented 1 year ago

To be honest, I don't think a naive implementation needs a database for this. We could just build it on top of a temporary folder and hashed URLs.

KhafraDev commented 1 year ago

yeah, but then it's a worse version of something that can be done elsewhere. It also adds a lot more complexity to an already complicated (badly written) spec.

jimmywarting commented 1 year ago

Having a IndexedDB would be something useful. I know it's not the absolut loveliest api \w callbacks and what not. but it sure can save a lot of things structuralClone is able to clone. it includes: Circular ref, Object Literals, BigInts, Map, Set, ArrayBuffer, TypedArray, Blob, Files, Error, regex, FileSystem handles and some crypto stuff, etc.

and other databases are not so good at saving javascript objects that are structural cloneable. it goes very much hand in hand with the v8.Serializer to make it a searchable.

I could definitely see me building a CacheStorage on top of IndexedDB. There are quite a bunch of npm package that expects that it exist so it can store curtain things without really having to care about the environment it's operating in. it's almost like how every android app have a built in nosql database So it is much easier to build polyfills for this kind of stuff.

ronag commented 1 year ago

@benjamingr does implementing a db make us more web compliant?

jimmywarting commented 1 year ago

Now when i think about it... @mcollina have a good argument about using the file system. Responses could be very large so being able to pipe the body to/from the disc without allocating much memory would also be useful. some ppl download pretty large stuff.

and doing things such as:

const responseClone = response.clone()
cache.put(request, responseClone)
return response

lets you operate on it much earlier in parallel then if you had to put everything in a database and wait for it to be copied over with a single ArrayBuffer or something

The cache storage pretty much lets you iterate over all Request/Responses and sees them. so if the Responses could be constructed with a lazy stream or heck even a disc based blob then that would be a plus

So a file based storage would not actually be that bad if it's stream/memory friendlier

KhafraDev commented 1 year ago

I mentioned it previously but the CacheStorage/Cache spec is not written as concisely/cleanly as the fetch spec. It's actually much worse than the websocket spec, which is impressive. None of the steps are clear and it feels like steps are skipped over/not implemented (look at Cache.addAll, the spec doesn't mention caching the responses but it expects you to). Adding another layer of complexity here makes it even harder to implement.

Referencing Deno: their implementation is massively outdated and doesn't include many methods, either on Cache or CacheStorage. They're missing Cache.prototype.add/Cache.prototype.addAll, CacheStorage.prototype.matchAll, and probably a few others. It's also behind a flag I suspect, because logging globalThis.caches returns undefined.

Not to mention that the only way to open file-backed blobs is through fs.openAsBlob which was added in v19.8.0 and is still experimental.


However if you want to implement it (could be a fun weekend project), I have a branch that implements the outlines of both Cache and CacheStorage, with webidl conversions and everything needed.

ronag commented 1 year ago

I think using e.g. the chromium implementation is the best spec 🤷‍♂️

jimmywarting commented 1 year ago

maybe it dose not necessary have to be just a disc- based blob. openAsBlob would take a long time until everybody could actually start using it... Response / fetch just probably see blobs in the init and convert the body to streams anyway? body = blob.stream() or something and Response also support streams and async iterables too. so it could maybe just be as simple as just doing something like new Response(fs.createReadStream(path))

KhafraDev commented 1 year ago

However we also need to cache the body (and request headers and method) because each cache match is meant to return a new Response.

I took a look at the chromium implementation and it looks like each cache entry creates 3 temporary files (1 for the body, 1 for headers, and 1 for other request stuff). But since it's in c++ it's kinda hard to translate that to js.

jimmywarting commented 1 year ago

the Request body isn't saved, right? that's how i remembered it... only the request header & request url is saved? would it not just be possible to save just 2 files... one for Response body and then a json with request and response info...

// 1.json
{
  url: "https://example.com/cat.png"
  request: {
    headers,
    url
  },
  response: {
    headers,
    status,
    statusText,
    url: response.url // the final response url after redirects
  },
}
// 1.bin
the hole response body...
KhafraDev commented 1 year ago

Yep, request body doesn't need to be saved, I was referring to the response body there. I'm not sure why they use 3 files, maybe I just misread because I can't find the comment again.

Their implementation creates a directory for each cache (ie. await caches.open('v1') will create a v1 directory, which is where the files are then saved. But we also don't want to name the files 1.json, 1.bin etc. because it wouldn't be very performant to do a cache lookup.

Some limitations/design details/questions:

benjamingr commented 1 year ago

@ronag

@benjamingr does implementing a db make us more web compliant?

Not as far as I'm aware but it would make it easier to write code that works both in browsers/node which is the point of web compliance in the first place.