Don't add blocks to the datastore

MichaelMure commented 2 years ago

This leave the responsibility and choice to do so to the caller, typically go-blockservice.

This has several benefit:

untangle the code
allow to use an exchange as pure block retrieval
avoid double add

Close https://github.com/ipfs/kubo/issues/7956

MichaelMure commented 2 years ago

cc @Jorropo

Jorropo commented 2 years ago

@MichaelMure i dont think its quite the right thingy to do here.

Afait bitswap was dedupping concurrent getblocks while what you propose would have a put per getblock.

I think bitswap should do puts but only when the block is receved from the network. Or am I missing something here ?

MichaelMure commented 2 years ago

Afait bitswap was dedupping concurrent getblocks while what you propose would have a put per getblock.

I believe this is still happening, the same way it was done before:

The diff in the PR is quite unreadable, I'd suggest looking at the resulting code where it's much clearer.

I think bitswap should do puts but only when the block is receved from the network. Or am I missing something here ?

This PR removes the put done in HasBlock (that were unnecessary) and move the puts for blocks from the network to https://github.com/ipfs/go-blockservice/pull/92. The former for performance reason, the latter for better separation of concerns and composability.

One thing that this PR does change is that Puts are no longer batched together, or rather no longer batched the same way. Before, there would be a PutMany for each bitswap message. Now, BlockService only get a channel of blocks and doesn't know how to batch Put together. At first it looks like it would reduce performance but I believe it makes sense in the long run, as batching with a time window in the BlockService side could result in a better batching, instead of following strictly the network messages. How often is there more than one block in a message?

MichaelMure commented 2 years ago

Not sure what's happening with the CI, test are OK for me locally.

Stebalien commented 2 years ago

Unfortunately, it looks like this will act like we have a block before that block actually gets written locally. I.e., on receive, we'll:

Announce the block to the network.
Tell the engine that we have it (enqueuing tasks for it).
Then send the block to the blockservice without actually writing it to the blockstore.

This creates a race where we might, in some cases (performance, canceled requests, etc.) fail to send blocks to peer when we have them.

As discussed in person (written here so we're on the same page), a simple solution is to just not tell the network anything until the blockservice calls HasBlock. Basically:

Blockservice asks the exchange for the block.
Exchange does its thing, updates the engine to record receipt of the block, but doesn't actually enqueue any work.
The blockservice then calls HasBlock on the exchange.
The exchange then serves the block.

Basically, this is the same as this PR but taken to the extreme. The receive side is completely disconnected from the send side (except for the bandwidth ledger).

Changes required:

Split Engine.ReceiveFrom into Engine.ReceivedBlocks (from network) and Engine.HaveBlocks (or something like that). The former will update ledgers, etc. The latter will enqueue work for sending blocks to peers.
Make the blockservice put blocks read from the exchange into the blockstore, and notify the exchange by calling Has on read.

Tricky parts:

Ideally, bitswap wouldn't announce anything. The exchange should handle sending and deduplicating provider records, but that's a larger change. The new problem is that this change may cause us to repeatedly announce the same block.
This is a silent breaking change and will need to be advertised accordingly. I'd consider making breaking changes to the exchange interface at the same time (maybe add a HasBlocks method, and/or rename to NotifyNewBlocks?).

MichaelMure commented 2 years ago

This is a silent breaking change and will need to be advertised accordingly. I'd consider making breaking changes to the exchange interface at the same time (maybe add a HasBlocks method, and/or rename to NotifyNewBlocks?).

I like that idea. Does that sounds good to you? Anything else?

 // Interface defines the functionality of the IPFS block exchange protocol.
 type Interface interface { // type Exchanger interface
        Fetcher

-       // TODO Should callers be concerned with whether the block was made
-       // available on the network?
-       HasBlock(context.Context, blocks.Block) error
+       // NotifyNewBlocks tells the exchange that a new block is available and can be served.
+       NotifyNewBlocks(ctx context.Context, block blocks.Block) error

        IsOnline() bool

        io.Closer
 }

Stebalien commented 2 years ago

I'd make it take multiple blocks (or have a second function that takes multiple blocks), but yeah, something like that.

MichaelMure commented 2 years ago

@Stebalien let's make it variadic then: https://github.com/ipfs/go-ipfs-exchange-interface/pull/23

MichaelMure commented 2 years ago

@Stebalien I think I'm done now, tests are green at least on some platform, but I don't fully know what I'm doing here.

ipfs / go-bitswap

Don't add blocks to the datastore #571