Closed aschmahmann closed 1 year ago
Flagging https://github.com/filecoin-saturn/L1-node/issues/289 so we don't make the same mistake here. Block requests should not be converted to car requests.
Update: Work is taking place in #61
Closing this issue out requires dealing with the high level tasks indicated above. However, the blockers for testing this for usage in Rhea are:
ECD 2023-03-29
Date for date/plan: 2023-03-30
BlocksGateway
implementation kubo uses but with DAG prefetching of blocks happening underneath)Thanks @aschmahmann . I inlined this information into the issue description.
@aschmahmann fysa I've moved the description of Implementation Phases from #61 to this meta-issue and marked the first two as done.
Mind clarifying which phase covers CAR-based resumes (?format=car&depth=1
) instead of block-by-block (or add one?)
Why? It will have a very positive impact on website loads because website assets share a common parent, and we will be able to avoid over-fetching AND if we use cAR instead of block-by-block, we'll avoid round-trips at the same time.
Example:
/ipfs/cid/sub/index.html
/ipfs/cid/sub/assets/a.jpg
/ipfs/cid/sub/assets/b.css
/ipfs/cid/sub/assets/c.js
When we load index.html
, we learn about contents of /sub
and learn what is the cid-of-assets
.
When we load a.jpg
, we also retrieve blocks for parent dir, and can enumerate it to learn what are the CIDs of other files in the assets dir.
Ideally, opening /ipfs/cid/sub/
would fetch parents only once, and only fetch specific sub-graphs.
I imagine it would translate to below requests:
/ipfs/cid/sub/index.html?format=car&depth=1 # learn cid-of-assets
/ipfs/cid-of-assets/a.jpg?format=car&depth=1 # learn cids in /assets
/ipfs/cid-of-b?format=car&depth=1 #direct fetch of a file
/ipfs/cid-of-c?format=car&depth=1 #direct fetch of a file
Lmk how feasible this is, and if should we add this as (3.5) or something else?
Mind clarifying which phase covers CAR-based resumes (?format=car&depth=1) instead of block-by-block (or add one?)
I should actually reword it (I'll change it above) but this is phase 3.
(3) Start doing the walk locally and then if a path segment is incomplete send a request for blocks and upon every received block try to continue
"send a request for blocks" should be "send a request for a CAR/blocks" (i.e. it's the same ask for a CAR, if it fails just use blocks) as above.
Note: In the case listed above you're likely actually asking for the directory and then implicitly getting index.html and it might look like this:
/ipfs/cid/sub?format=car&depth=0 (or bytes=0:0) # learn cid-of-sub and if it's a directory or file
/ipfs/cid-of-sub/index.html?format=car&depth=1 # get-the-index.html
/ipfs/cid-of-sub/assets/a.jpg?format=car&depth=1 # learn cid of assets (this might or might not already be known based on if sub is a sharded directory)
/ipfs/cid-of-b?format=car&depth=1 #direct fetch of a file
/ipfs/cid-of-c?format=car&depth=1 #direct fetch of a file
Note: the latter two might also be /ipfs/cid-of-sub/assets/(b|c).jpg?format=car&depth=1
as it's a race condition with the client based on when bifrost-gateway has received the blocks from the first request and if they're still in cache since all three of those assets might be requested by the browser simultaneously. bifrost-gateway could notice all requests are coming from the same user for the same path and slow down some of the requests a little too save on wasted bandwidth, but that's something we can evaluate later.
This is closed by #160. bifrost-gateway has largely handled the concerns in Completion tasks to mark this done-done-done
.
However, we now use backpressured processing and incremental verification of CAR responses rather than buffering all the data on memory or in on disk-cache with block-request fallbacks.
Done Criteria
While there is an implementation of
gateway.IPFSBackend
that can leverage retrievals of CAR files with the relevant data in them.It should implemented the proposed version of the API here, which shouldn't have major changes before the above PR lands.
Implementation stages
Why Important
Implementation Phases
Details and Dependencies
ECD: 2023-03-27
Blockers for mirroring traffic for Rhea
ECD: 2023-03-29
The work is happening in #61. See there for more details
Blockers for production traffic for Rhea
ECD: TBD - Date for a date/plan: 2023-03-30
We need to have sufficient testing of the bifrost-gateway code given we aren't able to run Kubo's battery of sharness tests against it (per https://github.com/ipfs/bifrost-gateway/issues/58 ).
Options being considered:
BlocksGateway
implementation kubo uses but with DAG prefetching of blocks happening underneath)Completion tasks to mark this done-done-done
gateway.IPFSBackend
request into a CAR request (should be relatively straightforward)Additional Notes
There already is an implementation of
gateway.IPFSBackend
that uses the existing tooling for block-based storage/retrieval here (and related to https://github.com/ipfs/bifrost-gateway/pull/57).Some details related to Caboose:
If we need to make some compromises in the implementation here in order to start collecting some data that's doable, but if so they should be explicitly called out and issues filed. Additionally, it should continue to be possible to use a blocks gateway implementation here via config.
cc @Jorropo @aarshkshah1992