ipfs-inactive / js-ipfs-unixfs-importer

[ARCHIVED] JavaScript implementation of the UnixFs importer used by IPFS
MIT License
5 stars 4 forks source link

Default raw leaves to true #50

Closed achingbrain closed 4 years ago

achingbrain commented 4 years ago

Builds on #49. A separate PR as it's slightly contentious.

Sets default DAG construction to be a dag-pb root node, dag-pb intermediate nodes and ipld-raw nodes for leaves. This will make parsing ever so slightly faster and DAG sizes ever so slightly smaller as there is no protobuf wrapper for the actual file data.

Currently you may end up with ipld-raw leaves or dag-pb leaves that contain UnixFS entries with type 'file' or 'raw' depending on where the importer is invoked from.

E.g. to generate the same CIDs as go-IPFS, ipfs.add will result in a balanced DAG with UnixFS leaf nodes with a type 'file', ipfs.files.write will result in a trickle DAG with UnixFS leaf nodes of a type raw, and specifying CID version 1 will get you ipld-raw leaf nodes and whatever tree strategy you specifed, default balanced. I believe the reason for this is largely historical, go-IPFS has different importers for different subsystems written by different people at different times when different bits of IPLD were available.

I think this is chaos and we should go a step further than this PR - we should use ipld-raw leaf types everywhere and only offer options to change the DAG structure, not leaf types.

BREAKING CHANGE:

ribasushi commented 4 years ago

... with type 'file' or 'raw' depending on ...

Archeological question: what is the rationale for using the raw PB type? The encoding size is the same, but you can always seamlessly compose "files" into "larger files", while a raw node is not usable standalone.

achingbrain commented 4 years ago

I'm not sure, it predates my time on the project. My gut feeling is that it's from a time before ipld-raw and CIDv1. Maybe @Stebalien knows more?

Stebalien commented 4 years ago

I believe the raw dag-pb unixfs type was supposed to be for file leaves. And yes, that came before IPLD.

Stebalien commented 4 years ago

We chose not to make this the default in go-ipfs to avoid changing CIDs. That is, users expect that ipfs add -r ... will always produce the same CID when run on the same directory.

go-ipfs does default to raw leaves when --cid-version=1 is passed as that changes CIDs anyways.

The plan is to switch all the defaults (cidv1, raw leaves, etc.) all at once, ideally when we get UnixFSv2. Until then, doing this piece-wise is likely to cause quite a bit of noise and annoying issues.

One solution would be to introduce an "importer version" flag. Each time we introduce a fancy new importer feature, we'd define a new set of import options and assign an "importer version" to it. Users would be able to pass --importer=latest to get the latest and greatest features (cidv1, raw leaves, better chunker, inline CIDs, etc.).

mikeal commented 4 years ago

Just for clarity, does this default actually persist into js-ipfs or is this being set every time by js-ipfs‘s use of the importer?

If the defaults here are always being overwritten by js-ipfs I think it makes the discussion about defaults for this library a bit clearer and easier.

achingbrain commented 4 years ago

In theory js-IPFS sets the option every time, but some may lack defaults so get passed as undefined which merge-options will ignore if a default has been set.

Anyway, @Stebalien, @alanshaw, @lidel and I had a catch up about this and the way forward is this: