storacha / w3link

šŸŖ The IPFS gateway for web3.storage is not "another gateway", but a caching layer that sits on top of existing IPFS public gateways.
Other
24 stars 9 forks source link

Concatenated upload index of all CAR indexes for a root CID #49

Open olizilla opened 1 year ago

olizilla commented 1 year ago

We hit issues where users send us a dag split over > 1000 CARs as we have to load a CAR index for each CAR before we can figure out where to fetch blocks from. If we create an upload index file for the root CID, as the concatenation of each CAR index, we only have to fetch a single file before we can start responding. I think this would solve the issue we're seeing #46

There remain edge cases where a single file is split over > 1000 CARs, but either the user is sending us CAR shards that are too small, or the file is massive. For example if users stick with the 100MiB CAR shard size we provide, they'd upload a 32GiB DAG in 328 CARs, so we could tackle that as a lower priority issue.

olizilla commented 1 year ago

As a low risk first pass, we could create the concatenated upload index after each call to upload/add in the upload api. It's fine if the user calls it multiple times and adds CARs incrementally. We can rebuild it from scratch each time, or read any existing one and make it smarter if we need to.

olizilla commented 1 year ago

could be one for https://github.com/mikeal/multiblock

alanshaw commented 1 year ago

Flag that we'd not be able to simply concat them together because we lose the information about which CAR file the blocks are in (as we'd effectively be creating a single index for multiple CARs). So we'd just need to include CAR CID the block can be found in in the rollup.

mikeal commented 1 year ago

+1 if ā€œconcatenateā€ means ā€œwrite each index as a block in a CAR along with an object that maps the CAR CIDs to each Index CIDā€ ;)

alanshaw commented 1 year ago

There's a new index in town: https://github.com/alanshaw/cardex#multi-index-index

TLDR; it's a CARv2 index which is a list of car-cid,carv2-index pairs.

I'm going to try out rollups using this index.