Closed jimmywarting closed 1 year ago
I don't have time to do it myself but I'd be happy to see something like this added.
I'll look into it.
I believe we would need sqlite or some alternative to implement this correctly, which makes it a non-starter: "Furthermore, because CacheStorage requires file-system access...". Deno also uses sqlite for this.
Oh, didn't know deno had cache storage, that is cool :) then i think NodeJS should also definitely have it too!
If node was to implement some form of sqlite database this would be actionable, but we can't really do anything better than an in-memory cache (which is worse than a userland alternative).
RocksDB with the BlobDB variant would be a perfect fit IMHO
@mcollina Any thoughts? Would vendoring a db into node core even be an option?
I think it would be good if the nodes http-loader could utilize fetch + CacheStorage. But also for other things such as caching static api responses like some config etc.
I think it would be useful, but better as another library. Nothing requires fetch internals afaict so it should be possible. Adding a postinstall script and a million build dependencies doesn't seem like a good idea for undici.
... but better as another library. Nothing requires fetch internals afaict so it should be possible
What if you want to use: fetch(url, { cache: 'only-if-cached' })
?
fetch would have to do something...
I was talking about the CacheStorage implementation itself, not really integrating it into fetch, which might need fetch internals. There are a number of fetch options that are either ignored or have no effect, so setting cache
won't do anything. It is also possible for another library to wrap undici's fetch and implement the cache option themselves, similar as to what's been done with cookie jars and whatnot.
Here's what I came up with before I got tired of working on it: https://github.com/KhafraDev/node-cachestorage
It implements CacheStorage and Cache. None of Cache is implemented other than matching so you can't really test it out. If node ever got a database of some sort I'd probably copy this implementation into undici so feel free to contribute 😄.
To be honest, I don't think a naive implementation needs a database for this. We could just build it on top of a temporary folder and hashed URLs.
yeah, but then it's a worse version of something that can be done elsewhere. It also adds a lot more complexity to an already complicated (badly written) spec.
Having a IndexedDB would be something useful.
I know it's not the absolut loveliest api \w callbacks and what not. but it sure can save a lot of things structuralClone
is able to clone. it includes: Circular ref, Object Literals, BigInts, Map, Set, ArrayBuffer, TypedArray, Blob, Files, Error, regex, FileSystem handles and some crypto stuff, etc.
and other databases are not so good at saving javascript objects that are structural cloneable. it goes very much hand in hand with the v8.Serializer
to make it a searchable.
I could definitely see me building a CacheStorage on top of IndexedDB. There are quite a bunch of npm package that expects that it exist so it can store curtain things without really having to care about the environment it's operating in. it's almost like how every android app have a built in nosql database So it is much easier to build polyfills for this kind of stuff.
@benjamingr does implementing a db make us more web compliant?
Now when i think about it... @mcollina have a good argument about using the file system. Responses could be very large so being able to pipe the body to/from the disc without allocating much memory would also be useful. some ppl download pretty large stuff.
and doing things such as:
const responseClone = response.clone()
cache.put(request, responseClone)
return response
lets you operate on it much earlier in parallel then if you had to put everything in a database and wait for it to be copied over with a single ArrayBuffer or something
The cache storage pretty much lets you iterate over all Request/Responses and sees them. so if the Responses could be constructed with a lazy stream or heck even a disc based blob then that would be a plus
So a file based storage would not actually be that bad if it's stream/memory friendlier
I mentioned it previously but the CacheStorage/Cache spec is not written as concisely/cleanly as the fetch spec. It's actually much worse than the websocket spec, which is impressive. None of the steps are clear and it feels like steps are skipped over/not implemented (look at Cache.addAll, the spec doesn't mention caching the responses but it expects you to). Adding another layer of complexity here makes it even harder to implement.
Referencing Deno: their implementation is massively outdated and doesn't include many methods, either on Cache or CacheStorage. They're missing Cache.prototype.add
/Cache.prototype.addAll
, CacheStorage.prototype.matchAll
, and probably a few others. It's also behind a flag I suspect, because logging globalThis.caches returns undefined.
Not to mention that the only way to open file-backed blobs is through fs.openAsBlob which was added in v19.8.0 and is still experimental.
However if you want to implement it (could be a fun weekend project), I have a branch that implements the outlines of both Cache and CacheStorage, with webidl conversions and everything needed.
I think using e.g. the chromium implementation is the best spec 🤷♂️
maybe it dose not necessary have to be just a disc- based blob. openAsBlob would take a long time until everybody could actually start using it... Response / fetch just probably see blobs in the init and convert the body to streams anyway? body = blob.stream()
or something and Response
also support streams and async iterables too. so it could maybe just be as simple as just doing something like new Response(fs.createReadStream(path))
However we also need to cache the body (and request headers and method) because each cache match is meant to return a new Response.
I took a look at the chromium implementation and it looks like each cache entry creates 3 temporary files (1 for the body, 1 for headers, and 1 for other request stuff). But since it's in c++ it's kinda hard to translate that to js.
the Request body isn't saved, right? that's how i remembered it... only the request header & request url is saved? would it not just be possible to save just 2 files... one for Response body and then a json with request and response info...
// 1.json
{
url: "https://example.com/cat.png"
request: {
headers,
url
},
response: {
headers,
status,
statusText,
url: response.url // the final response url after redirects
},
}
// 1.bin
the hole response body...
Yep, request body doesn't need to be saved, I was referring to the response body there. I'm not sure why they use 3 files, maybe I just misread because I can't find the comment again.
Their implementation creates a directory for each cache (ie. await caches.open('v1')
will create a v1
directory, which is where the files are then saved. But we also don't want to name the files 1.json
, 1.bin
etc. because it wouldn't be very performant to do a cache lookup.
Some limitations/design details/questions:
vary
header, so we should create new folders for each url (or move to a separate folder when 1 url is cached multiple times?).@ronag
@benjamingr does implementing a db make us more web compliant?
Not as far as I'm aware but it would make it easier to write code that works both in browsers/node which is the point of web compliance in the first place.
I was wondering if you would be willing to look into implementing the Cache Storage api
it was something i was looking into in the past... but it got abandoned. it is pretty much tightly coupled with Request/Response classes
and i would like to use it to avoid having to make extra requests if i have the key (Request) cached locally.
it's also in fact tightly coupled with the
cache
option in Request too. it would be neat if this could just work with some default storage and Request / Responses where saved locally on the disc somewhere.manual working with caches works to...
(taken from mdn)
maybe
caches.open('node:undici')
could store the default request / responses from fetch when using the cache option?