Open Jorropo opened 1 year ago
2023-10-05 conversation:
@Jorropo something we'd very much like is a fast recursive ls
to help figure out the structure of an unixFS DAG, basically tree
. Right now, we do O(dirs)
ipfs-unixfs.ls
requests to a kubo node to be able to figure out a file structure, without actually fetching the files themselves. This is of course ridiculously inefficient and take ~60 seconds even with our own node within AWS. We can't get the entire DAG, including the leaves, because it's generally way too large.
So, my questions:
ls
something what would be in scope for boxo
, or is it the wrong abstraction layer? boxo
to implement a stand-alone service to do it?
Dedicated unixfs implementations are often much more performant and efficient. See for example two of mines:
ipfs add
implementation that exists)io.Reader
file with incremental verification and data streaming)I think it is more sustainable to maintain a handful of dedicated implementations that individually do one job well (something like feather and something that would maintain state to support unordered dags are really different).
Unified middle layer
The way I have been writing my implementations is by jugling bytes
unixfs.proto
anddag-pb.proto
by hand everywhere. This is kinda meh looking because of all the little details that have to be checked.I really think we need some efficient middle layer that takes in
blocks.Block
parse them, do sanity checks and return some representation that looks like internal unixfs (so one node per block, and full support for all schenanigans you would want to do but with error checking):This would not includes helper functions for more advanced stuff like HAMT, chunking, ... we will most likely need thoses too but the goal here is to provide a thin wrapper around the protobuf that add the repetitive input validation.
Impls
List of implementations we need:
.car
files from saturn on the cheap resource wise. (and other ordered blocks sources)io.Reader
) incrementally (while streaming the result out).io.Reader
for reading from files without any background goroutine (efficiency)..WriteAt
to write blocks in any order as long as we receive roots before leaves (or cache leaves but then incremental verification is not possible)