Batch gets - Githubissues

ianopolous commented 6 years ago

In a high latency network, e.g a truly interplanetary network, or closer to home, Tor, it would be beneficial for apps to be able to request multiple blocks in a single request.

Essentially I'm proposing block.get can take a list of cids rather than just one.

An interesting implementation question is whether the resulting blocks are returned in the same order as requested. I can see it being helpful for the client to have things sent as soon as they are retrieved, but then you need a way to tell the client what the order is (technically you don't, the client could just hash every block, but then if you have multiple hash algorithms then this gets unwieldy quickly).

Note that http pipelining doesn't help here.

whyrusleeping commented 6 years ago

I think the primary difficulty is figuring out how to send back blocks. We will probably have to define some length delimited format. And the question of order is interesting, though we probably want to return things in the same order they are requested

Stebalien commented 6 years ago

This sounds like an implementation issue. What implementation are you referring to? go-ipfs's bitswap implementation batches requests when sending out wantlist updates and we've just added a batch blockstore get method.

Are you referring to the HTTP API? That's definitely not intended to be used over an interplanetary network (or even over a planetary network).

ianopolous commented 6 years ago

@Stebalien Indeed, I am referring to the http api (which is already planetary via ipfs.io). How else would an application access an IPFS instance behind a Tor or I2P hidden service?

Stebalien commented 6 years ago

The API/gateway at ipfs.io is meant as a stop-gap (it's entirely centralized). Ideally, every user would run their own node.

In terms of I2P and Tor, we have, in the past, worked on a libp2p transport that worked over tor. We've put it on the shelf for now as we'd like to battle test IPFS a bit first but that would be the ideal way to do this. That is, if you want to use IPFS with tor, you'd spin up a new IPFS node per tor pseudo identity and use the Tor libp2p transport.

ianopolous commented 6 years ago

Many applications only have access to the http api. E.g. self hosted web apps. The same is true of the application I'm writing, Peergos. The interface to IPFS is the http api. Normally that's on localhost, but on Tor it will be remote. And even now, https://demo.peergos.net interacts with IPFS over the net via the http api. This works and performs fine without Tor.

Stebalien commented 6 years ago

Normally that's on localhost, but on Tor it will be remote.

On tor that should still be local where the local peer should connect to other peers over tor. This isn't currently possible on our version of IPFS but, e.g., OpenBazaar has an IPFS fork that does just that.

Note: IPFS apps running over tor in a browser will probably need to use js-ipfs for now, at least, as a properly configured tor browser won't let webapps talk to local services.

I harp on this because the gateway is centralized and fragile. It has already been blocked in China and really is just a stop-gap measure.

However, disregarding intended use, you're right. A batch get would be useful, even for local applications. HTTP request overhead is non-trivial. (Although, on Unix at least, I'd like a unix domain socket API but that's a different issue).

ianopolous commented 6 years ago

I'm aware of the OpenBazaar fork and hope to use it.

We cannot use js-ipfs because it is too insecure. We have a very small, auditable codebase and our JS makes calls directly back to the hosting server (which can be localhost, peergos.net or a tor hidden service) and runs fine in TorBrowser.

ipfs / notes

Batch gets #285