Open michaelavila opened 6 years ago
This seems like a good approach because if we can get to step 3, we get a speed boost to all commands that provide blocks, independent of finding a better providing strategy. But also, once we come up with such a strategy, it will be easier to introduce.
I just want to understand 1 and 2; I don't think we can do a release of this work short of 3. Don't we have to be able to at least maintain the current behavior> So are 1 and 2 internal milestones on a longer lived branch on the way to a unified mechanism for re/providing? Or would they be turned with an experiment flag? That seems challenging given how complex the code might be at that intermediate point, but maybe that would actually be a good way to get to 3?
@eingenito that's correct. That's why I said this "The first few are so small they wouldn't result in PRs on their own, but I want to give some insight into how I'm approaching the problem."
Totally make sense.
1. Remove bitswap provide
The purpose of this step is to remove providing from bitswap and to rely solely on reprovide. This is not optimal.
I assume you plan on al least improving reprovider? Last time I looked at it it was barely providing 1-few nodes per minute, which isn't really useful, and relying on it for all bitswap (incoming data) providing is probably a bad idea. We also want to provide relevant data quickly after it has been fetched to be able to spread load over more peers in case some piece of content becomes popular.
@magik6k that's correct. I'll update this issue as I have discovered a few things and slightly changed direction since I last edited it. It might be more accurate now to say that the goal is to introduce an efficient provider system and then, in the future, refactor the reprovider to use the provider in some way.
I see there are some ticks in 4. Provide all blocks in the appropriate place in go-ipfs
, do you have some code to look at that you could PR (even if deeply WIP)?
@magik6k of course! Here: https://github.com/ipfs/go-ipfs/tree/experiment/provide%2Breprovide
Any and all feedback is welcome. What's there right now is intended to be the simplest change I can make to get the behavior I need up to this point. I've had some discussions about introducing some abstraction so that we don't have to explicitly call node.Provider.Provide
as much, but I haven't spent much time determining what (and if) that abstraction should be.
This looks mostly good, but as noted on the golang sync, I'd prefer the providing logic to be implemented in coreapi. Ideally commands shouldn't have to access ipfs node (cmdenv.GetNode()
) at all.
@magik6k thank you for taking a look. I will go through and move this stuff to coreapi. If I have any questions I'll reach back out here and I'll add you to the PR when it goes up.
@magik6k ~question: what do you propose doing about commands which do not delegate out directly to CoreAPI
, e.g. ipfs add
which uses Unixfs().Add
?~ Disregard, I see now that Unixfs
is just part of CoreAPI
.
I just wanted not to lose track of a comment that @Stebalien made in a PR because it's interesting context to some of this discussion: https://github.com/ipfs/go-ipfs/pull/5870#discussion_r261048681
Note that there are some reasonable requests for the ipfs dht provide
command, which is tangentially related to this work. I'd like to look into it as well
ipfs dht provide command/query is taking still way too long.
Any updates? Any new strategies? I only see a lot of cross issues that are linked.
EDIT: Is it possible to improve the speed using additional parameters maybe?
Current work is being tracked here: https://github.com/ipfs/go-ipfs/pull/5870
I'm creating this issue to expose how I'm approaching the provider subsystem and new provider strategies tasks. I will add info as I make progress. If you feel strongly about anything written here, please let me know so we can discuss.
There's a substantial implementation discussion in #5840
There are a handful of related issues: #3813 #4147 #4311 And PRs: #4333 #4113
Provider Subsystem
Problem
The current method for deciding which blocks to provide is naive and results in performance problems when adding many blocks.
Solution
For a given dag of blocks provide only some subset of those blocks.
Steps
These are some very small steps I'm taking to progress towards the end goal. The first few are so small they wouldn't result in PRs on their own, but I want to give some insight into how I'm approaching the problem. The last few are so vaguely defined they're not actionable, my hope is that they become clearer as we make progress.
~1. Remove bitswap provide~
~The purpose of this step is to remove providing from bitswap and to rely solely on reprovide. This is not optimal.~
2. Introduce provider and provide all strategy, use post add
The purpose of this step is to provide each block that has been added using
ipfs add
using roughly the mechanism the reprovider uses. This is not optimal.~3. Make the provider robust~
~The purpose of this step is to modify provider so that it provides blocks as efficiently as
go-bitswap
. This is a little closer to optimal.~go-bitswap
currently uses to send provide messages~4. Provide all blocks in the appropriate place in
go-ipfs
additions and modifications
ipfs add
ipfs block put
ipfs dag put
ipfs object new
ipfs object patch
ipfs object put
ipfs tar add
gatewayHandler
postHandler
putHandler
deleteHandler
ipfs pubsub
ipfs pubsub sub <topic>
retrievals
ipfs get
ipfs ls
ipfs cat
ipfs refs
ipfs block get
ipfs block stat
ipfs object get
ipfs object data
ipfs object diff
ipfs object links
ipfs object stat
ipfs dag get
ipfs dag resolve
ipfs ls
a cid that you do not have in your local repo)5. Change find providers to account for some cids not being provided
The purpose of this step is to modify the existing find providers system to work in the situation where the block being looked for is not provided. In the process of starting this work, we will need a strategy for providing only a subset in order to exercise the code that looks for providers. We can use a simplistic strategy like pin roots.
6-8. Introduce new, more efficient strategy
The purpose of this step is to introduce a providing strategy that more robustly solves our providing needs. This is optimal. This is likely many steps and will become clearer as we complete work and @Kubuxu completes his research/experiments. We will elaborate on this step in time.
9. Make the provider and reprovider consistent
The purpose of this step is to ensure we don't have inconsistencies between the provider and reprovider, e.g. a programmer not providing blocks for some operation, but those blocks being provided eventually by the reprovider anyway.
Branches
cc @keks @Kubuxu