Closed achingbrain closed 4 years ago
... with type 'file' or 'raw' depending on ...
Archeological question: what is the rationale for using the raw PB type? The encoding size is the same, but you can always seamlessly compose "files" into "larger files", while a raw
node is not usable standalone.
I'm not sure, it predates my time on the project. My gut feeling is that it's from a time before ipld-raw
and CIDv1. Maybe @Stebalien knows more?
I believe the raw dag-pb unixfs type was supposed to be for file leaves. And yes, that came before IPLD.
We chose not to make this the default in go-ipfs to avoid changing CIDs. That is, users expect that ipfs add -r ...
will always produce the same CID when run on the same directory.
go-ipfs does default to raw leaves when --cid-version=1
is passed as that changes CIDs anyways.
The plan is to switch all the defaults (cidv1, raw leaves, etc.) all at once, ideally when we get UnixFSv2. Until then, doing this piece-wise is likely to cause quite a bit of noise and annoying issues.
One solution would be to introduce an "importer version" flag. Each time we introduce a fancy new importer feature, we'd define a new set of import options and assign an "importer version" to it. Users would be able to pass --importer=latest
to get the latest and greatest features (cidv1, raw leaves, better chunker, inline CIDs, etc.).
Just for clarity, does this default actually persist into js-ipfs
or is this being set every time by js-ipfs
‘s use of the importer?
If the defaults here are always being overwritten by js-ipfs
I think it makes the discussion about defaults for this library a bit clearer and easier.
In theory js-IPFS
sets the option every time, but some may lack defaults so get passed as undefined
which merge-options
will ignore if a default has been set.
Anyway, @Stebalien, @alanshaw, @lidel and I had a catch up about this and the way forward is this:
Builds on #49. A separate PR as it's slightly contentious.
Sets default DAG construction to be a
dag-pb
root node,dag-pb
intermediate nodes andipld-raw
nodes for leaves. This will make parsing ever so slightly faster and DAG sizes ever so slightly smaller as there is no protobuf wrapper for the actual file data.Currently you may end up with
ipld-raw
leaves ordag-pb
leaves that contain UnixFS entries with type 'file' or 'raw' depending on where the importer is invoked from.E.g. to generate the same CIDs as go-IPFS,
ipfs.add
will result in a balanced DAG with UnixFS leaf nodes with a type 'file',ipfs.files.write
will result in a trickle DAG with UnixFS leaf nodes of a typeraw
, and specifying CID version 1 will get youipld-raw
leaf nodes and whatever tree strategy you specifed, default balanced. I believe the reason for this is largely historical, go-IPFS has different importers for different subsystems written by different people at different times when different bits of IPLD were available.I think this is chaos and we should go a step further than this PR - we should use
ipld-raw
leaf types everywhere and only offer options to change the DAG structure, not leaf types.BREAKING CHANGE:
options.rawLeaves
now defaults totrue