ipfs / boxo

A set of reference libraries for building IPFS applications and implementations in Go.
https://github.com/ipfs/boxo#readme
Other
197 stars 86 forks source link

pinning/pinner: Pinner should not be tasked to fetch blocks #378

Open MichaelMure opened 1 year ago

MichaelMure commented 1 year ago

Having worked on the Pinner in various way (including alternate implementation), it occurred to me that it really shouldn't be tasked to:

  1. write the root block into storage in Pin(): https://github.com/ipfs/go-ipfs-pinner/blob/72f5e02e73db5e05ef0a140a7938cbc89dfc38b0/dspinner/pin.go#L177
  2. fetch the full DAG in Pin(): https://github.com/ipfs/go-ipfs-pinner/blob/72f5e02e73db5e05ef0a140a7938cbc89dfc38b0/dspinner/pin.go#L202

This is for different (not obvious, I admit) reasons:

Admittedly, things have worked that way for a long time, but fixing that might help in the long run. Looking at where Pin is called in kubo, it also looks like a non-invasive change.

MichaelMure commented 1 year ago

On an intuition level, this has a similar vibe as having the caller side of bitswap adding blocks in storage (https://github.com/ipfs/go-bitswap/pull/571), which helped splitting client and server side, among other things.

lidel commented 1 year ago

cc @Jorropo for visibility, I remember you were suggesting refactoring the way we do fetch during pinning

Jorropo commented 1 year ago

The pinner keeps track of long term state in it's internal pin database, it also keep track of state in the blockstore, it would be VERY weird if it kept track of some state but not all of it (if we still had the internal DB but not the blockstore part of it).

Currently it does not even store the blocks directly, it relies on side effect of the blockservice (blockservice stores all block downloaded).

Also unlike https://github.com/ipfs/go-bitswap/pull/571 where there was a bug where blocks were local puts were duplicated (once in blockservice and once in bitswap), I am not aware of any bug here, it's just very side effect heavy code that hard to use without the monolith architecture Kubo was built on.

Maybe there is something better elsewhere but this seems like a complete refactor of the pinner which I don't want to take on right now. I am not convainced that the two usecases here (pinning per requests and pinning on a monolith) are similar enough to warant sharing a package. Except than a traversal logic (which can be abstracted in a new or already existing third package) I don't see what they would have in commun.

MichaelMure commented 1 year ago

The pinner keeps track of long term state in it's internal pin database, it also keep track of state in the blockstore, it would be VERY weird if it kept track of some state but not all of it (if we still had the internal DB but not the blockstore part of it).

Yet that's what is actually happening. Consider those two examples (among other):

Also unlike https://github.com/ipfs/go-bitswap/pull/571 where there was a bug where blocks were local puts were duplicated (once in blockservice and once in bitswap), I am not aware of any bug here, it's just very side effect heavy code that hard to use without the monolith architecture Kubo was built on.

If you look at my pin/add example, I'm pretty sure there is a duplicated Put there. DAGService get the block from the network (side effect: put), then pass it to Pinner which will write it again.

Maybe there is something better elsewhere but this seems like a complete refactor of the pinner which I don't want to take on right now.

I'm only suggesting to eliminate Pin in favor of PinWithMode (although Pin is a better name). This function is only used 4 times in kubo (not counting tests), so that should be quite manageable. We get back to a uniform API, eliminates double write, remove the odd side effect that makes it hard to compose and reuse ...

As for bitswap, I'd like to ask you to do the exercise in reverse: why would Pinner need to write blocks? Why does it needs to fetch a DAG? I don't see better reason than "it was written that way in ancient ages because it was convenient".

MichaelMure commented 1 year ago

Additionally, Add() is currently racy with the GC, as it doesn't create a PinLock. This is because the pinner doesn't have access to the blockstore, while the calling side (kubo) does.