ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.9k stars 2.98k forks source link

canonical method for checking locality of a block via coreAPI #6726

Open b5 opened 4 years ago

b5 commented 4 years ago

Location

:wave: ๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง! I'm looking for the best way to ask "hey is this block local"?

Given that we're in IPFS user land, we should be using core API for this stuff. Which means we're trying to do stuff that's easiest accomplished with a blockstore via coreapi.

Blockstore method that works exactly as we'd like:

https://github.com/ipfs/go-ipfs-blockstore/blob/3eee0dff760dea94c59381e9a18b9aa290361b28/blockstore.go#L36

Relevant coreAPI subsystem:

https://github.com/ipfs/interface-go-ipfs-core/blob/1c94e6217184ed07abab840143aa1f417e314c3d/block.go#L20-L37

Description

We also have the same question about complete DAGs, happy to file a separate issue if you'd prefer. I'd love to be pointed to the canonical methods of determining if a block or dag is local via core API.

Thanks!

Stebalien commented 4 years ago

In theory, you should be able to call api.WithOptions(apiopts.Offline(true)).Block().Stat(ctx, "/some/path"). That shouldn't need to fetch the block itself (and getting the size of the block almost always has the same cost as learning if we have it).

Unfortunately, it doesn't look like we currently do it this way. We currently fetch the block anyways because the blockservice interfaces don't support GetSize or Has. However, adding a Stat method directly to the BlockService (returning the size of the block) should be pretty straight forward.

(note: if you specify "offline mode", we won't fetch the block from the network, just from disk)

Stebalien commented 4 years ago

We also have the same question about complete DAGs, happy to file a separate issue if you'd prefer.

At the moment, the only way to do this would be to traverse the dag.

b5 commented 4 years ago

Ah yes. We've been doing something close already with configuration: https://github.com/qri-io/dag/blob/master/dsync/dagservice.go good to know we're on the right track, but we might be using the wrong configuration settings.

Using this this approach breaks down when we swap an in-process IpfsNode for the HTTP interface implementation.

Adding Stat to block service makes plenty sense to me. Since you've tagged help wanted I should be able to throw some time at this. Our cloud infra relies on the HTTP interface swap, and we're doing a bunch of extra block swapping over the network because we can't quickly establish a diff of blocks we have on the cloud side.

How many implementations of blockservice are kicking around? Any interest in me PR'ing in adding Stat to blockservice.BlockGetter: https://github.com/ipfs/go-blockservice/blob/d77e8ef129b738d07d598690fe168c8430db4e9e/blockservice.go#L28 ? Happy to take some marching orders on this if it helps

Stebalien commented 4 years ago

Ah yes. We've been doing something close already with configuration: https://github.com/qri-io/dag/blob/master/dsync/dagservice.go good to know we're on the right track, but we might be using the wrong configuration settings.

FetchBlocks(false) should also work. That will resolve IPNS/DNS but not use bitswap.

Our cloud infra relies on the HTTP interface swap, and we're doing a bunch of extra block swapping over the network because we can't quickly establish a diff of blocks we have on the cloud side.

Note: The CoreAPI's Block().Stat() should not send the block over the HTTP API. It just reads it from disk. That is, Block().Stat() sends a request to /api/v0/block/stat and gets back the size of the block.

My point is that we can make this even faster and avoid reading the block from disk.

How many implementations of blockservice are kicking around?

Only the one, as far as I know.


As for figuring out which blocks you don't have... I don't know of a good way to do this right now. We'd need to extend the API.

Stebalien commented 4 years ago

Ah, but, if we already have a list of blocks we should have, we can feed this list to IPFS to figure out which ones we don't have. Unfortunately, that's (currently) going to be an HTTP request per block.