ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.07k stars 3.01k forks source link

Provider Subsystem + Strategies #5774

Open michaelavila opened 5 years ago

michaelavila commented 5 years ago

Current work is being tracked here: https://github.com/ipfs/go-ipfs/pull/5870

I'm creating this issue to expose how I'm approaching the provider subsystem and new provider strategies tasks. I will add info as I make progress. If you feel strongly about anything written here, please let me know so we can discuss.

There's a substantial implementation discussion in #5840

There are a handful of related issues: #3813 #4147 #4311 And PRs: #4333 #4113

if it's crossed out, it's completed

Provider Subsystem

Problem

The current method for deciding which blocks to provide is naive and results in performance problems when adding many blocks.

Solution

For a given dag of blocks provide only some subset of those blocks.

Steps

These are some very small steps I'm taking to progress towards the end goal. The first few are so small they wouldn't result in PRs on their own, but I want to give some insight into how I'm approaching the problem. The last few are so vaguely defined they're not actionable, my hope is that they become clearer as we make progress.

~1. Remove bitswap provide~

~The purpose of this step is to remove providing from bitswap and to rely solely on reprovide. This is not optimal.~

2. Introduce provider and provide all strategy, use post add

The purpose of this step is to provide each block that has been added using ipfs add using roughly the mechanism the reprovider uses. This is not optimal.

~3. Make the provider robust~

~The purpose of this step is to modify provider so that it provides blocks as efficiently as go-bitswap. This is a little closer to optimal.~

~I've deferred bringing the provider and reprovider in line with one another until the very end of all this work. There's a bit to think about that I just don't want to tackle yet. Like, how strategies should be articulated in code and to the user given that provide/reprovide logic is different (e.g. reprovide strategies operate over the entire blockstore), and bigger things like, when reproviding, how to account for blocks that were not provided initially so that they aren't reprovided.~

4. Provide all blocks in the appropriate place in go-ipfs

additions and modifications

retrievals

Can possibly merge at this point.

5. Change find providers to account for some cids not being provided

The purpose of this step is to modify the existing find providers system to work in the situation where the block being looked for is not provided. In the process of starting this work, we will need a strategy for providing only a subset in order to exercise the code that looks for providers. We can use a simplistic strategy like pin roots.

6-8. Introduce new, more efficient strategy

The purpose of this step is to introduce a providing strategy that more robustly solves our providing needs. This is optimal. This is likely many steps and will become clearer as we complete work and @Kubuxu completes his research/experiments. We will elaborate on this step in time.

Can possibly merge at this point.

9. Make the provider and reprovider consistent

The purpose of this step is to ensure we don't have inconsistencies between the provider and reprovider, e.g. a programmer not providing blocks for some operation, but those blocks being provided eventually by the reprovider anyway.

Branches

cc @keks @Kubuxu

hannahhoward commented 5 years ago

This seems like a good approach because if we can get to step 3, we get a speed boost to all commands that provide blocks, independent of finding a better providing strategy. But also, once we come up with such a strategy, it will be easier to introduce.

eingenito commented 5 years ago

I just want to understand 1 and 2; I don't think we can do a release of this work short of 3. Don't we have to be able to at least maintain the current behavior> So are 1 and 2 internal milestones on a longer lived branch on the way to a unified mechanism for re/providing? Or would they be turned with an experiment flag? That seems challenging given how complex the code might be at that intermediate point, but maybe that would actually be a good way to get to 3?

michaelavila commented 5 years ago

@eingenito that's correct. That's why I said this "The first few are so small they wouldn't result in PRs on their own, but I want to give some insight into how I'm approaching the problem."

eingenito commented 5 years ago

Totally make sense.

magik6k commented 5 years ago

1. Remove bitswap provide

The purpose of this step is to remove providing from bitswap and to rely solely on reprovide. This is not optimal.

I assume you plan on al least improving reprovider? Last time I looked at it it was barely providing 1-few nodes per minute, which isn't really useful, and relying on it for all bitswap (incoming data) providing is probably a bad idea. We also want to provide relevant data quickly after it has been fetched to be able to spread load over more peers in case some piece of content becomes popular.

michaelavila commented 5 years ago

@magik6k that's correct. I'll update this issue as I have discovered a few things and slightly changed direction since I last edited it. It might be more accurate now to say that the goal is to introduce an efficient provider system and then, in the future, refactor the reprovider to use the provider in some way.

magik6k commented 5 years ago

I see there are some ticks in 4. Provide all blocks in the appropriate place in go-ipfs, do you have some code to look at that you could PR (even if deeply WIP)?

michaelavila commented 5 years ago

@magik6k of course! Here: https://github.com/ipfs/go-ipfs/tree/experiment/provide%2Breprovide

Any and all feedback is welcome. What's there right now is intended to be the simplest change I can make to get the behavior I need up to this point. I've had some discussions about introducing some abstraction so that we don't have to explicitly call node.Provider.Provide as much, but I haven't spent much time determining what (and if) that abstraction should be.

magik6k commented 5 years ago

This looks mostly good, but as noted on the golang sync, I'd prefer the providing logic to be implemented in coreapi. Ideally commands shouldn't have to access ipfs node (cmdenv.GetNode()) at all.

michaelavila commented 5 years ago

@magik6k thank you for taking a look. I will go through and move this stuff to coreapi. If I have any questions I'll reach back out here and I'll add you to the PR when it goes up.

michaelavila commented 5 years ago

@magik6k ~question: what do you propose doing about commands which do not delegate out directly to CoreAPI, e.g. ipfs add which uses Unixfs().Add?~ Disregard, I see now that Unixfs is just part of CoreAPI.

eingenito commented 5 years ago

I just wanted not to lose track of a comment that @Stebalien made in a PR because it's interesting context to some of this discussion: https://github.com/ipfs/go-ipfs/pull/5870#discussion_r261048681

michaelavila commented 5 years ago

Note that there are some reasonable requests for the ipfs dht provide command, which is tangentially related to this work. I'd like to look into it as well

melroy89 commented 2 years ago

ipfs dht provide command/query is taking still way too long.

Any updates? Any new strategies? I only see a lot of cross issues that are linked.

EDIT: Is it possible to improve the speed using additional parameters maybe?