Open olizilla opened 1 year ago
As a low risk first pass, we could create the concatenated upload index after each call to upload/add
in the upload api. It's fine if the user calls it multiple times and adds CARs incrementally. We can rebuild it from scratch each time, or read any existing one and make it smarter if we need to.
could be one for https://github.com/mikeal/multiblock
Flag that we'd not be able to simply concat them together because we lose the information about which CAR file the blocks are in (as we'd effectively be creating a single index for multiple CARs). So we'd just need to include CAR CID the block can be found in in the rollup.
+1 if āconcatenateā means āwrite each index as a block in a CAR along with an object that maps the CAR CIDs to each Index CIDā ;)
There's a new index in town: https://github.com/alanshaw/cardex#multi-index-index
TLDR; it's a CARv2 index which is a list of car-cid
,carv2-index
pairs.
I'm going to try out rollups using this index.
We hit issues where users send us a dag split over > 1000 CARs as we have to load a CAR index for each CAR before we can figure out where to fetch blocks from. If we create an upload index file for the root CID, as the concatenation of each CAR index, we only have to fetch a single file before we can start responding. I think this would solve the issue we're seeing #46
There remain edge cases where a single file is split over > 1000 CARs, but either the user is sending us CAR shards that are too small, or the file is massive. For example if users stick with the 100MiB CAR shard size we provide, they'd upload a 32GiB DAG in 328 CARs, so we could tackle that as a lower priority issue.