ipfs / js-ipfs

IPFS implementation in JavaScript
https://js.ipfs.tech
Other
7.44k stars 1.25k forks source link

`ipfs.dag.import` should allow single blocks #4173

Closed inverted-capital closed 2 years ago

inverted-capital commented 2 years ago
welcome[bot] commented 2 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

tabcat commented 2 years ago

This would mean that ipfs.dag.export should also change since import/export methods usually can use each other's returns as parameters.

Export functions usually return something that can be written to disk easily like a string or byte array and this would not do that.


What you could do to 'import' an array of blocks to the repo, instead of changing the import/export methods, is iteration + ipfs.dag.put api. And getting an array of blocks from a root cid can be done using multiformats/traversal + ipfs.dag.get.

inverted-capital commented 2 years ago

Yes you're right export must be consumable by import. The problem is that ipfs.dag.put takes a plain object, not a block. There is mention of a test that shows this was sort of considered but no implementation in the code.

Also when importing a CAR file, should it not go the same path as adding any other data to ipfs, such as preload and possibly pinning ? In the current dag implementation, it just writes straight to the repo.

How is the interface of js-ipfs governed ? Is there something higher than this repo, like the golang repo that defined these apis, or are they open to be extended directly by js-ipfs ?

Should I make a pull request that simply sniffs the input to ipfs.dag.put to see if the supplied data was already a Block instance, and if so, avoid any of the hashing ?

tabcat commented 2 years ago

The problem is that ipfs.dag.put takes a plain object, not a block.

I like this but I'd like the block api to take and return blocks. iirc it used to work that way? Since starting to use the js-multiformats library I haven't used the dag api. Just took another look at the block.put documentation, guess I've been forgetting to specify the format option which may be affecting the cid for the data.

Also when importing a CAR file, should it not go the same path as adding any other data to ipfs, such as preload and possibly pinning ? In the current dag implementation, it just writes straight to the repo.

I'd expect it to take a similar path to the dag.put and block.put methods but just started working with CAR files so don't know enough.

Should I make a pull request that simply sniffs the input to ipfs.dag.put to see if the supplied data was already a Block instance, and if so, avoid any of the hashing ?

this is one of the reasons I wish the block api took blocks.

inverted-capital commented 2 years ago

Yes I suppose the blocks api is a more appropriate place than the dag api. Dag api seems to be "js objects in, js objects out", however blocks api seems to be "Uint8Arrays in, Uint8Arrays out" and all I want is "multiformat/block in, multiformat/block out". I just want to remove the double overhead of hashing, since I've already hashed the block myself, for my own calculation purposes, and now I want to store it as is.

What to do ? Wait for whoever is in charge of the API to say what they prefer and then make a PR in ? So far making a CAR file and importing it using the dag api is the closest thing I've got to avoid double hashing.

lidel commented 2 years ago

In Kubo, the convention / separation of concerns is more or less this (could be bit different in js-ipfs, but not much):

all I want is "multiformat/block in, multiformat/block out". I just want to remove the double overhead of hashing, since I've already hashed the block myself, for my own calculation purposes, and now I want to store it as is.

Sounds like you already have a raw block, and already calculated CID for it outside js-ipfs, and now want to do block put but without calculating CID for the second time.

@achingbrain any concerns around adding optional cid parameter that would disable hasher in https://github.com/ipfs/js-ipfs/blob/7a7e091c5d7110542ca7ab6eca1c0c9abb19e54b/packages/ipfs-core/src/components/block/put.js#L24-L30 ?

achingbrain commented 2 years ago

any concerns around adding optional cid parameter that would disable hasher in

We had this before. But we removed it because the only way to verify that the block you are adding is, in fact, related to the CID you want to store it under is to hash it.

tabcat commented 2 years ago

this makes sense to do when the data source is remote and hasn't been verified yet; it seems like this is protecting against accidental misuse by the user and/or malicious actions from a compromised part of a program. the block api accepting multiformats/block here would protect the user similarly. which makes me think the main concern is a malicious actor that can access the block api but not the storage api the repo is using.

inverted-capital commented 2 years ago

Thanks @lidel - you have captured my intent. @achingbrain this stance is attempting to protect the node against the developer that has full control of the node, at the computational expense of said developer. If I want to ruin my node by storing mismatched data, that should be detected by other nodes during bitswap, and (possibly) on rehydration from the data store. This feels congruent to @tabcat's view.

I know much less about this API than you three, but based on @lidel's summary of Kubo's distinctions, dag put|get sounds most like my use case.

For our specific use case, we have moved on to writing to the repo directly now, but it was surprising to us that we could not manage our own multiformats/block creation without incurring a double hash penalty. This is a minor inconvenience when compared to the huge assistance provided to us by having the ipfs and libp2p libraries available, and we are especially grateful for the esm and typescript conversions - that must have been a nightmare to pull off 🙏

lidel commented 2 years ago

Notes from today's JS triage call:

tabcat commented 2 years ago

I dont think requiring a multiformats Block be used with the block api would create a footgun, but I'll just look to write to the repo in the future.