ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.9k stars 2.98k forks source link

Meta: HTTP Gateway cache control improvements #8717

Open lidel opened 2 years ago

lidel commented 2 years ago

This is a meta issue for HTTP cache improvements that we should prioritize in go-ipfs:

cc @thattommyhall & @mathew-cf) if there are more asks/ideas here

thattommyhall commented 2 years ago

Only thing I would add is a strategy of expiring at "top of the hour" or "end of the day" for the DirIndex pages makes dealing with when the etag does eventually change a bit nicer (you wont see different things on different servers for as long), which nudges me to prefer Expires to Cache-Control (but its not that much extra to calculate it anyway)

mathew-cf commented 2 years ago

An idea in addition to the ones proposed: A public resolution endpoint for gateways could be useful as an inexpensive, cacheable call. There is no public endpoint (AFAIK) to verify resolutions for IPNS public keys or IPFS subpaths without fetching the whole file from a gateway, so an endpoint or query param exposing resolution could be helpful.

thattommyhall commented 2 years ago

@mathew-cf thats useful, but you'd have to rewrite to fetch /ipfs/<cid>/<path> or you couldnt be certain that the content you fetched matched. Like if I checked what the dnslink was for blog.ipfs.io then fetched /ipns/blog.ipfs.io it might have changed in the meantime

mathew-cf commented 2 years ago

By cacheable, I meant for a short TTL (~1 min). This resolution response can still be supplemented by DNS resolution for faster cache verification/invalidation. If you have the x-ipfs-roots header, you can check that the resolution and response match and choose whether to cache the response.

I'm thinking in the context of using Cloudflare workers here if that helps clarify the context.

lidel commented 2 years ago

@mathew-cf Thoughts on leveraging existing HTTP HEAD responses for these quick checks?

mathew-cf commented 2 years ago

@lidel That works for me!

thattommyhall commented 2 years ago

I tested something like your dir-index-html strategy with lua in nginx, but I used X-Accel-Expires to keep it within our gateway. Nginx will honor it, I think maybe Varnish but I cant find a clear ref (though I am near-certain there is an equivalent)

So configuring the name of the header might be useful is the concrete ask here, but someone else that uses another cache might be able to chime in

lidel commented 2 years ago

If we fix https://github.com/ipfs/go-ipfs/issues/1818#issuecomment-1015849462 (set proper cache-control header of /ipns/ and dir-index-html responses), that would be universal hint for all HTTP caching tools/solutions (in case where gateway operator wants to cache things for longer, the minimal max-age could be raised via config, decreasing the need for custom headers like X-Accel-Expires)

thattommyhall commented 2 years ago

@lidel did fix/dir-index-html-max-age land somewhere?

lidel commented 2 years ago

@thattommyhall kinda: https://github.com/ipfs/go-ipfs/pull/8758 fixed a bug and now adds cache-control when a directory has index.html and returns it instead of dir listing response.

Generated dir listings don't have cache-control header, but they have deterministic Etag + will return HTTP 304 Not Modified if client sends matching Etag in If-None-Match header.

thattommyhall commented 2 years ago

I'd like to advocate for something like top of the hour Expires or something in the DirIndex case too. It's nice not to have to re-ask the backend at least for a short while

lidel commented 2 years ago

@thattommyhall I am leaning towards setting Cache-Control that asks for caching forever (immutable), because dir listing is costly to generate, and we don't change them that often.

Given how people deploy gateway infra, we would still want to revalidate on CDN/caching proxies, so how about:

Cache-Control: public, max-age=31536000, s-maxage=604800, stale-while-revalidate=86400, stale-if-error=86400, immutable

(https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control)

My understanding is the caching proxy will always return a cached version of dir listing, but will try to revalidate if the cached copy is older than a week.

Thoughts?