Closed lidel closed 5 months ago
cc @aschmahmann: the (A) is a brain dump of the idea how we could logically share the block caches I've mentioned earlier this week. sanity check would be appreciated.
@hacdias fysa after discussing with @aschmahmann it seems that option (C) is easiest to wire up and get to work today (we already have bitswap), but allows us to leverage HTTP in future, once a client exists.
I imagine the end user would only need to set up single list:
RAINBOW_PEERING_ADDRS=/dns4/peer1.example.com/tcp/4001/p2p/{peerid1},/dns4/peer2.example.com/tcp/4001/p2p/{peerid2}
This will both:
So one premise of rainbow vs. the older gateways was to avoid hosting: if data is not retrievable from somewhere in ipfs network, then that is not rainbow's problem. This moves into the direction of actually using rainbow to host things, by relying on other rainbow peer's caches, which in turns assumes that rainbow peers do cache things for non-negligible times. Big baggage.
Reg RAINBOW_PEERING_ADDRS, the idea of --seed
and --seed-index
was that each peer can autogenerate other rainbow peers addresses, look them up in the DHT and auto-protect connections to them without need for any ad-hoc configuration like needing to provide a list of peers, which is always a pain when rolling out and scaling up-down things. Maybe it's useful now.
The idea is to limit the baggage by enabling "hosting of cached things" only for safelisted peerids. This "cache-sharing" requires mutual agreement and is opportunistic, has no SLA for how long things are cached, and the default bitswap behavior for non-safelisted peers remains to always respond "i dont have it".
Perhaps we should rename this feature and move away from "peering" to "cache sharing" to set expectations closer to reality and avoid feature creep?
In case of ipfs.io "cache sharing" will be with other rainbow instances, but we have use cases where people self-host their own datasets and want to use rainbow as a dedicated gateway in front of kubo or ipfs-cluster, hoped to create config option which works for them too, that is why explicit RAINBOW_PEERING_ADDRS
was proposed, but it might be too flexible.
Reusing --seed
and --seed-index
+ having peer routing announcements would allow us to do peering and cache sharing without having to configure peerids/multiaddrs. I agree, it feels "safer" for ecosystem, and easier to maintaine. By limiting cache sharing only to sibling rainbow instances, we dont bring baggage or allow for anti-patterns: it is oly for "rainbow cache sharing" and is still forcing everyone to use regular / delegated routing for discovering "real" providers.
If we go with --seed
, we could enable cache sharing via opt-in configuration. I guess we need to limit the number of peerids we generate for safelisting, so perhaps RAINBOW_CACHE_SHARE_ALLOW_INDEXES=a,b-c
where a
is seed-index
of specific rainbow instance we allow cache sharing with and b-c
is range? ipfs.io infra would have simple RAINBOW_CACHE_SHARE_ALLOW_INDEXES=0-n
so perhaps
RAINBOW_CACHE_SHARE_ALLOW_INDEXES=a,b-c
wherea
isseed-index
of specific rainbow instance we allow cache sharing with andb-c
is range? ipfs.io infra would have simpleRAINBOW_CACHE_SHARE_ALLOW_INDEXES=0-n
I would separate the feature where rainbow peers autodiscover and connect to each others as something of its own... perhaps call it RAINBOW_SEEDS_PEERING, or RAINBOW_SEEDS_SWARM, or RAINBOW_SEEDS_NETWORK. It's not only useful for caches. My original thought was to mount diverse functionality on top, in particular distributed rate-limiting/quota system across a swarm of rainbows.
So after implementing that, on the side you could have RAINBOW_SEEDS_BITSWAP, or RAINBOW_SHARED_CACHE or whatever, which relies on the rainbow swarm.
Regarding indexes: I would index 0-100 by default, then maybe have an MAX_INDEX option to go higher/lower. I fear having ranges may just one more way to configure things wrong for users.
Sgtm. We can add opt-in config:
RAINBOW_SEEDS_PEERING=true|false
false
by defaulttrue
RAINBOW_SEEDS_PEERING_MAX_INDEX
, if not set, uses 100 as implicit defaultRAINBOW_PEERING_SHARED_CACHE=true|false
false
by defaultpeering.ListPeers()
Then the infra at ipfs.io would run with same SEED, RAINBOW_SEEDS_PEERING=true
and RAINBOW_PEERING_SHARED_CACHE=true
We can add opt-in config:
Sounds good. Only nitpick is that RAINBOW_PEERING_SHARED_CACHE
should automatically enable RAINBOW_SEEDS_PEERING
and require its preconditions right? or should it fail loudly if "peering = false"? I lean towards the former.
Also I understand these become flags too as everything else.
Problem
At inbrowser.dev (backed by rainbow from ipfs.io gateway, so a general problem in our infra), we see inconsistent page load times across regions, and sometimes across requests within the same region.
User can get instant response from one instance, and then on subsequent page load, or request, I get stalled page load and timeout, even tho the data exist in cache of one of the other rainbows in the global cluster. We also see inconsistency across subresources on a single page.
Scope
Solutions
A: Add HTTP Retrieval Client to Rainbow, leverage
Cache-Control: only-if-cached
We know we need HTTP retrieval client for Kubo to enable HTTP Gateway over Libp2p by default, and to make direct HTTP retrieval from service providers more feasible. We can't do that without a client and end-to-end tests. Prototyping one in Rainbow sounds like a good plan, improving multiple work streams at the same time.
The idea here is to introduce HTTP client which runs in addition, or in parallel to bitswap retrieval. Keep it simple, don't mix abstractions, do opportunistic block retrieval like bitswap, but over HTTP.
Using
application/vnd.ipld.raw
and trustless gateway protocol is a good match here: allows us to benefit from HTTP caching and middleware, making it more flexible than bitswap.Rainbow could:
Cache-Control: only-if-cached
going over list in sequence.This way, once a block lands in any of our rainbow caches, we will discover it, and requests won't timeout after 1m on unlucky scenarios.
Open questions:
B: Set up reverse proxy (nginx, lb) to try rainbows with
Cache-Control: only-if-cached
firstWriting this down just to have something other than (A), I don't personally believe (B) is feasible.
The idea here is to update the way our infrastructure proxies gateway requests to rainbow instances, and first ask all upstream instances within the region for resource with
Cache-Control: only-if-cached
, and if none of them has the thing, retry with a normal request that will trigger p2p retrieval.The downside here is that this feels like antipattern:
Cache-Control
C: Reuse Bitswap client and server we already have
Right now, Rainbow runs Bitswap in read-only mode. It always says it does not have data when asked over bitswap.
What we could do is to a permissioned version of peering:
/ip|dns*/.../p2p/peerid
, otherwise we/p2p/
multiaddrs (quick and easy), leverage existing peering config / libraries where possible (https://github.com/ipfs/rainbow/pull/35)/http
)D: ?
Ideas welcome.