Open vasco-santos opened 3 years ago
Additional note on an ideal here: we should not hide yet another "ordered DAG walk" implementation in here. I think it probably belongs in js-multiformats, it's in the same family of concerns as the Block
functionality already there. It's just complicated by the need to have multiple codecs available. Such a walk function could be provided with:
https://github.com/ipfs/js-ipfs/blob/6a2c710e4b66e76184320769ff9789f1fbabe0d8/packages/ipfs-core/src/components/dag/export.js#L82-L107 has an implementation that's a little like this that we did for dag export
. It would be good to implement something shared so we could even remove code from there.
One requirement I’d like to surface here.
Users with large amounts of data are writing custom tooling to get their file data “into IPFS” so that they can then write out a CAR file suitable for Filecoin (which really needs to be deterministic).
There are obvious perf issues with moving this much data and suffering excessive copying in memory and on disc.
For these users:
Given that you can retrieve the full DAG fine with a non-deterministic CAR file, this probably isn't the highest priority.
Hola, It would be great if the code maintainers or project managers can give more priority to this. Im working on a decentralized application and looking forward to migrate the content from IPFS to WEB3Storage but I want to do it in a deterministic way.
@AugustoL can you say more about what you need? For many use cases, the CAR itself won't need to be deterministically packed. You can import an identical DAG from it with ipfs dag import
.
Current state
The current implementation of
ipfs-car
writes the CAR file blocks in any specific order, as follows:This means that we currently have a different output for the same file as
go-ipfs
andjs-ipfs
, which do an ordered walk.Motivation
Supporting deterministic outputs will enable
ipfs-car
to have the same output CAR as the coreipfs
implementation and move us towards supporting other use cases like interact directly with Filecoin (and perhaps offline deals).Implementation
Given we currently have two iterations (unixfs importer + blockstore iteration), we can support a deterministic output by getting the root and traverse the graph like https://github.com/ipld/js-datastore-car/blob/master/car.js#L198-L221
We should make this optional and pluggable, given we will need to add codecs and hashers which would increase the dependency footprint for users who not need deterministic CAR files.
We can alternatively support a different function where we do not do the two iterations and keep everything in memory. This would be faster and some users could be ok with the extra memory consumption. But, I would say the write performance to create the CAR file is not the biggest concern, and we have been focusing on efficiency more on Reads than Writes.
cc @rvagg @olizilla @mikeal @alanshaw