ipfs / boxo

A set of reference libraries for building IPFS applications and implementations in Go.
https://github.com/ipfs/boxo#readme
Other
214 stars 96 forks source link

Gateway: Cache-Control, max-age=ttl, ETag for /ipns requests #329

Open ion1 opened 9 years ago

ion1 commented 9 years ago

The gateway does not seem to provide a Cache-Control/max-age header or an ETag for /ipns URLs.

It might be useful to assume some reasonable non-zero max-age for content on IPNS. A minute? Fifteen seconds?

It would also be good to serve an ETag corresponding to whatever IPFS hash the IPNS path resolved to.

jbenet commented 9 years ago

It might be useful to assume some reasonable non-zero max-age for content on IPNS. A minute? Fifteen seconds?

It could be determined by the IPNS record -- some will have a calculable TTL.

ghost commented 9 years ago

Until then, we could set must-revalidate and a quasi infinite max-age

ion1 commented 9 years ago

I’m not sure if I have understood this correctly, but I’m under the impression that must-revalidate only applies if the content has already expired. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4

ghost commented 9 years ago

You're right, thanks -- so that means must-revalidate + max-age=0 = cache-indefinitely-unless-updated

ghost commented 9 years ago

Note that for /ipfs/, we can still set a quasi-infinite max-age.

ion1 commented 9 years ago

It seems to me that must-revalidate means that given an object that expired according to max-age, a cache must return an error instead of a stale copy if it fails to connect to the origin server. I’m not sure must-revalidate is of any use here. Unless I’m misinterpreting it, of course.

ion1 commented 9 years ago

DNS lets you say explicitly for how long a record can be cached. Perhaps we should do that with IPNS names? ipfs name publish --ttl=3h which would be taken advantage of by IPFS nodes who have resolved the name as well as by the gateway in the Cache-Control header.

ion1 commented 9 years ago

It was mentioned on IRC that resolve does not provide TTL information to be used for max-age.

resolveOnce could be changed to return a TTL value. resolve could start at infinity-ish and compute the minimum of the previous TTL and what resolveOnce returned at each iteration.

Given that the DNS lookup functions in the net package for Go do not seem to provide TTL information at the moment, the DNS resolveOnce could return a reasonable guess like one minute. If a DNS record points to /ipns/<keyhash>, that IPNS record can still decrement the TTL according to the iteration above if a low cache time is desired.

If resolve returned a TTL, the if strings.HasPrefix(urlPath, ipfsPathPrefix) check in gateway_handler.go could go away and the header be set unconditionally.

ion1 commented 9 years ago

Would this be a reasonable starting point?

diff --git a/namesys/interface.go b/namesys/interface.go
index 09c296c..b0b740d 100644
--- a/namesys/interface.go
+++ b/namesys/interface.go
@@ -47,6 +47,14 @@ const (
    // trust resolution to eventually complete and can't put an upper
    // limit on how many steps it will take.
    UnlimitedDepth = 0
+
+   // ImmutableTTL is the time-to-live value to be reported for immutable
+   // objects.
+   ImmutableTTL = 10 * 365 * 24 * time.Hour
+
+   // UnknownTTL is the time-to-live value to be reported for mutable
+   // paths when accurate information is not available.
+   UnknownTTL = time.Minute
 )

 // ErrResolveFailed signals an error when attempting to resolve.
@@ -88,7 +96,10 @@ type Resolver interface {
    // There is a default depth-limit to avoid infinite recursion.  Most
    // users will be fine with this default limit, but if you need to
    // adjust the limit you can use ResolveN.
-   Resolve(ctx context.Context, name string) (value path.Path, err error)
+   //
+   // Resolve also returns a time-to-live value which indicates the
+   // maximum amount of time the result may be cached.
+   Resolve(ctx context.Context, name string) (value path.Path, ttl time.Duration, err error)

    // ResolveN performs a recursive lookup, returning the dereferenced
    // path.  The only difference from Resolve is that the depth limit
@@ -97,7 +108,7 @@ type Resolver interface {
    //
    // Most users should use Resolve, since the default limit works well
    // in most real-world situations.
-   ResolveN(ctx context.Context, name string, depth int) (value path.Path, err error)
+   ResolveN(ctx context.Context, name string, depth int) (value path.Path, ttl time.Duration, err error)
 }

 // Publisher is an object capable of publishing particular names.
diff --git a/namesys/base.go b/namesys/base.go
index e552fce..29ef997 100644
--- a/namesys/base.go
+++ b/namesys/base.go
@@ -2,6 +2,7 @@ package namesys

 import (
    "strings"
+   "time"

    context "github.com/ipfs/go-ipfs/Godeps/_workspace/src/golang.org/x/net/context"

@@ -9,27 +10,36 @@ import (
 )

 type resolver interface {
-   // resolveOnce looks up a name once (without recursion).
-   resolveOnce(ctx context.Context, name string) (value path.Path, err error)
+   // resolveOnce looks up a name once (without recursion).  It also
+   // returns a time-to-live value which indicates the maximum amount of
+   // time the result may be cached.
+   resolveOnce(ctx context.Context, name string) (value path.Path, ttl time.Duration, err error)
 }

 // resolve is a helper for implementing Resolver.ResolveN using resolveOnce.
-func resolve(ctx context.Context, r resolver, name string, depth int, prefixes ...string) (path.Path, error) {
+func resolve(ctx context.Context, r resolver, name string, depth int, prefixes ...string) (path.Path, time.Duration, error) {
+   // Start with a long TTL.
+   ttl := ImmutableTTL
+
    for {
-       p, err := r.resolveOnce(ctx, name)
+       p, ttlOnce, err := r.resolveOnce(ctx, name)
+       // Use the lowest TTL reported by the resolveOnce invocations.
+       if ttlOnce < ttl {
+           ttl = ttlOnce
+       }
        if err != nil {
            log.Warningf("Could not resolve %s", name)
-           return "", err
+           return "", ttl, err
        }
-       log.Debugf("Resolved %s to %s", name, p.String())
+       log.Debugf("Resolved %s to %s (TTL %v -> %v)", name, p.String(), ttlOnce, ttl)

        if strings.HasPrefix(p.String(), "/ipfs/") {
            // we've bottomed out with an IPFS path
-           return p, nil
+           return p, ttl, nil
        }

        if depth == 1 {
-           return p, ErrResolveRecursion
+           return p, ttl, ErrResolveRecursion
        }

        matched := false
@@ -44,7 +54,7 @@ func resolve(ctx context.Context, r resolver, name string, depth int, prefixes .
        }

        if !matched {
-           return p, nil
+           return p, ttl, nil
        }

        if depth > 1 {
whyrusleeping commented 9 years ago

@ion1 yeah, we could return a ttl with every resolution... I'm generally against more return types than 1+error, but this is likely a cleaner approach than others

ion1 commented 9 years ago
<jbenet> […] i'm overall -1 on returning the TTL on every Resolve call. i think that should be a separate function call.
<jbenet> most uses of Resolve do not care.
<jbenet> and its important to keep the interface as clean as we can possible have it.
<ion> jbenet: Fair enough. So it would be better to add a new function to the Resolver interface?
<jbenet> ion: yeah a new function SGTM.

I’m going to do that then.

lidel commented 5 years ago

Just for the record, this is still an open issue for /ipns/ responses returned by go-ipfs v0.4.18.

2019-01-02--14-08-56

Etag is present, but we are missing an explicit IPNS expiration header.

AFAIK it could be either:

Why? There is a Last-Modified header so most browsers will use "Heuristic Freshness" to determine how long to cache that asset for. This is a weak cache hint, as RFC7234 suggests something like (current time - last modified time) / 10, however, there are subtle differences between Firefox and Chrome.

Due to this we should provide explicit cache expiration headers for /ipns/ to unify experience across all vendors.

magik6k commented 5 years ago

This will sort-of depend on https://github.com/ipfs/go-ipfs/issues/5884, as ipns supports dns resolution too. 'Real' IPNS records support TTL too, but AFAIK it's currently not used (basically it's set to 0).

lidel commented 2 years ago

actionable task extracted from https://github.com/ipfs/go-ipfs/pull/8074#pullrequestreview-645196768, seems to be a natural continuation fo this old issue here

TLDR: we are not leveraging TTLs in Gateway responses, which means things are not cached for as long as they could be. This is getting critical for our gateway infra, without this we are unable to cache things on Gateways as efficiently as we could.

What we need

Implementation notes

cc https://github.com/ipfs/go-ipfs/issues/5884, https://github.com/ipfs/go-ipfs/issues/7564

lidel commented 8 months ago

Reopening. We have spec at https://specs.ipfs.tech/http-gateways/path-gateway/#cache-control-response-header, but boxo/gateway does not implement it fully.

This is not fixed fully yet.

IPNS Records are handled correctly thanks to https://github.com/ipfs/boxo/pull/459 but we don't set Cache-Control with max-age based on TTL as noted in https://github.com/ipfs/boxo/pull/459#issuecomment-1705462860. I remember we did not have TTL there because Golang standard library is not exposing access to original TTL of DNS record when resolver from operating system is used.

Remaining work here

2color commented 8 months ago

port Cache-Control implementation from https://github.com/ipfs/boxo/pull/584, where value is public, max-age=, stale-while-revalidate=, stale-if-error=

For DNSLink, what would you recommend the stale-while-revalidate be, since it has nothing to do with the DHT?

lidel commented 8 months ago

Good question. Unless there is a better value, I suggest using Amino DHT expiration window (48h) in cases like this, where we need a sensible hard ceiling on caching mutable things.

Rationale: DNSLink resolves to some CID. Returning a cached HTTP response body is ok as long the the data behind that CID could be (in theory) retrievable from the public swarm provider. So if we count from the moment we resolved dnslink and successfully retrieved it, 48h is the max window we can assume the retrieval of the same payload would be successful.