ipfs / rainbow

A specialized IPFS HTTP gateway
https://docs.ipfs.tech/reference/http/gateway/
Other
75 stars 12 forks source link

Check block cache across multiple rainbow instances #109

Closed lidel closed 5 months ago

lidel commented 6 months ago

Problem

At inbrowser.dev (backed by rainbow from ipfs.io gateway, so a general problem in our infra), we see inconsistent page load times across regions, and sometimes across requests within the same region.

User can get instant response from one instance, and then on subsequent page load, or request, I get stalled page load and timeout, even tho the data exist in cache of one of the other rainbows in the global cluster. We also see inconsistency across subresources on a single page.

Scope

Solutions

A: Add HTTP Retrieval Client to Rainbow, leverage Cache-Control: only-if-cached

We know we need HTTP retrieval client for Kubo to enable HTTP Gateway over Libp2p by default, and to make direct HTTP retrieval from service providers more feasible. We can't do that without a client and end-to-end tests. Prototyping one in Rainbow sounds like a good plan, improving multiple work streams at the same time.

The idea here is to introduce HTTP client which runs in addition, or in parallel to bitswap retrieval. Keep it simple, don't mix abstractions, do opportunistic block retrieval like bitswap, but over HTTP.

Using application/vnd.ipld.raw and trustless gateway protocol is a good match here: allows us to benefit from HTTP caching and middleware, making it more flexible than bitswap.

Rainbow could:

This way, once a block lands in any of our rainbow caches, we will discover it, and requests won't timeout after 1m on unlucky scenarios.

Open questions:

B: Set up reverse proxy (nginx, lb) to try rainbows with Cache-Control: only-if-cached first

Writing this down just to have something other than (A), I don't personally believe (B) is feasible.

The idea here is to update the way our infrastructure proxies gateway requests to rainbow instances, and first ask all upstream instances within the region for resource with Cache-Control: only-if-cached, and if none of them has the thing, retry with a normal request that will trigger p2p retrieval.

The downside here is that this feels like antipattern:

C: Reuse Bitswap client and server we already have

Right now, Rainbow runs Bitswap in read-only mode. It always says it does not have data when asked over bitswap.

What we could do is to a permissioned version of peering:

D: ?

Ideas welcome.

lidel commented 6 months ago

cc @aschmahmann: the (A) is a brain dump of the idea how we could logically share the block caches I've mentioned earlier this week. sanity check would be appreciated.

lidel commented 6 months ago

@hacdias fysa after discussing with @aschmahmann it seems that option (C) is easiest to wire up and get to work today (we already have bitswap), but allows us to leverage HTTP in future, once a client exists.

I imagine the end user would only need to set up single list: RAINBOW_PEERING_ADDRS=/dns4/peer1.example.com/tcp/4001/p2p/{peerid1},/dns4/peer2.example.com/tcp/4001/p2p/{peerid2}

This will both:

hsanjuan commented 6 months ago

So one premise of rainbow vs. the older gateways was to avoid hosting: if data is not retrievable from somewhere in ipfs network, then that is not rainbow's problem. This moves into the direction of actually using rainbow to host things, by relying on other rainbow peer's caches, which in turns assumes that rainbow peers do cache things for non-negligible times. Big baggage.

Reg RAINBOW_PEERING_ADDRS, the idea of --seed and --seed-index was that each peer can autogenerate other rainbow peers addresses, look them up in the DHT and auto-protect connections to them without need for any ad-hoc configuration like needing to provide a list of peers, which is always a pain when rolling out and scaling up-down things. Maybe it's useful now.

lidel commented 6 months ago

The idea is to limit the baggage by enabling "hosting of cached things" only for safelisted peerids. This "cache-sharing" requires mutual agreement and is opportunistic, has no SLA for how long things are cached, and the default bitswap behavior for non-safelisted peers remains to always respond "i dont have it".

Perhaps we should rename this feature and move away from "peering" to "cache sharing" to set expectations closer to reality and avoid feature creep?

In case of ipfs.io "cache sharing" will be with other rainbow instances, but we have use cases where people self-host their own datasets and want to use rainbow as a dedicated gateway in front of kubo or ipfs-cluster, hoped to create config option which works for them too, that is why explicit RAINBOW_PEERING_ADDRS was proposed, but it might be too flexible.


Reusing --seed and --seed-index + having peer routing announcements would allow us to do peering and cache sharing without having to configure peerids/multiaddrs. I agree, it feels "safer" for ecosystem, and easier to maintaine. By limiting cache sharing only to sibling rainbow instances, we dont bring baggage or allow for anti-patterns: it is oly for "rainbow cache sharing" and is still forcing everyone to use regular / delegated routing for discovering "real" providers.

If we go with --seed, we could enable cache sharing via opt-in configuration. I guess we need to limit the number of peerids we generate for safelisting, so perhaps RAINBOW_CACHE_SHARE_ALLOW_INDEXES=a,b-c where a is seed-index of specific rainbow instance we allow cache sharing with and b-c is range? ipfs.io infra would have simple RAINBOW_CACHE_SHARE_ALLOW_INDEXES=0-n

hsanjuan commented 6 months ago

so perhaps RAINBOW_CACHE_SHARE_ALLOW_INDEXES=a,b-c where a is seed-index of specific rainbow instance we allow cache sharing with and b-c is range? ipfs.io infra would have simple RAINBOW_CACHE_SHARE_ALLOW_INDEXES=0-n

I would separate the feature where rainbow peers autodiscover and connect to each others as something of its own... perhaps call it RAINBOW_SEEDS_PEERING, or RAINBOW_SEEDS_SWARM, or RAINBOW_SEEDS_NETWORK. It's not only useful for caches. My original thought was to mount diverse functionality on top, in particular distributed rate-limiting/quota system across a swarm of rainbows.

So after implementing that, on the side you could have RAINBOW_SEEDS_BITSWAP, or RAINBOW_SHARED_CACHE or whatever, which relies on the rainbow swarm.

hsanjuan commented 6 months ago

Regarding indexes: I would index 0-100 by default, then maybe have an MAX_INDEX option to go higher/lower. I fear having ranges may just one more way to configure things wrong for users.

lidel commented 6 months ago

Sgtm. We can add opt-in config:

Then the infra at ipfs.io would run with same SEED, RAINBOW_SEEDS_PEERING=true and RAINBOW_PEERING_SHARED_CACHE=true

hsanjuan commented 6 months ago

We can add opt-in config:

Sounds good. Only nitpick is that RAINBOW_PEERING_SHARED_CACHE should automatically enable RAINBOW_SEEDS_PEERING and require its preconditions right? or should it fail loudly if "peering = false"? I lean towards the former.

Also I understand these become flags too as everything else.