ipfs-inactive / js-ipfs-unixfs-importer

[ARCHIVED] JavaScript implementation of the UnixFs importer used by IPFS
MIT License
5 stars 4 forks source link

rawLeaves may be broken #44

Closed mikeal closed 4 years ago

mikeal commented 4 years ago

I don’t think rawLeaves works as intended.

Here’s the output of jsipfs:

echo 'asdf' | jsipfs add --raw-leaves
added bafkreigrxsgtxjfpy7qqsyjmw45mxxo2ybjmsmbfvipyffbo3k5x324cue bafkreigrxsgtxjfpy7qqsyjmw45mxxo2ybjmsmbfvipyffbo3k5x324cue

Note that it prints the same CID twice, both of which are just the raw block created from the input. I believe the intended output is the raw block and a UnixFSv1 dag-pb node.

mikeal commented 4 years ago

I think I figured it out. When I was writing the fromParts code I realized that files that are using rawLeaves also need to disable reduceSingleLeafToSelf https://github.com/ipfs/js-ipfs-unixfs-importer/pull/46/files#diff-f3029438ce98a6fe360f16e5abce50d5R36 .

Not sure where the best place to fix this would be, but if someone wants to point me to the right place I can send another patch.

achingbrain commented 4 years ago

If you have rawLeaves: true and reduceSingleLeafToSelf: true and your file would fit in a single node, the final CID will point to a ipld-raw node and not a dag-pb node.

If you have rawLeaves: true and reduceSingleLeafToSelf: false you'll get a dag-pb node as the root, with one leaf that's a ipld-raw node.

This behaviour is consistent with go-IPFS but I'm not sure the first instance is correct for two reasons:

  1. The output from a unixfs-importer should be UnixFS things, not ipld-raw nodes
  2. You can't add metadata to a ipld-raw node as it has no dag-pb wrapper to store the metadata on

My suggestion would be to ignore reduceSingleLeafToSelf when rawLeaves is true. Or maybe ignore rawLeaves when the file would fit into a single node?

What do you think @Stebalien @alanshaw ?

Stebalien commented 4 years ago

For now, I'd ignore raw leaves when we have metadata and everything fits into a single file.

This also brings back the question of when should we optimize for deduplication (have a single file block pointing to a single raw leaf) versus number of blocks (have a single file block with the data+metadata inlined).

mikeal commented 4 years ago

One thing that makes that override odd is that reduceSingleLeafToSelf defaults to true and rawLeaves defaults to false. If a user goes out of their way to override a default, it’s very odd to have another default option override their stated option. If reduceSingleLeafToSelf wasn’t defaulted to true I’d totally agree with its behavior overwriting rawLeaves when also set by the user to true, but in this case it feels wrong.

achingbrain commented 4 years ago

You're right, the ergonomics are a little wonky but reduceSingleLeafToSelf isn't exposed to the user via ipfs.add so it'll only affect people using the importer directly.

In this case I think we should override rawLeaves because a unixfs-importer should produce unixfs things. It not doing that, I think, is more wonky.

Also rawLeaves should really default to true.

mikeal commented 4 years ago

With rawLeaves defaulting to true the behavior described feels correct.

achingbrain commented 4 years ago

Fixed by #49