ipfs-inactive / package-managers

[ARCHIVED] 📦 IPFS Package Managers Task Force
MIT License
97 stars 11 forks source link

Providing System #84

Open michaelavila opened 5 years ago

michaelavila commented 5 years ago

Why Improving Provider Strategies is Important

Our current scheme of breaking content up into 256kb blocks then providing each of those blocks to the rest of the network presents a large burden for announcing and discovering. Regardless of the performance of the DHT, or whatever underlying mechanism is used, the amount of content we want to live in IPFS necessitates not announcing (aka providing) every block to the network, our goals for the amount of content are ambitious.

I know of no official, repeated benchmarks of providing times. Lately, individual announcements (the network part of a provide) take several minutes.


Current State of Provider Strategies in go-ipfs (+ some history)

There exists an experimental providing system that I’ve been working on with the hope that it will eventually replace the current providing system in go-ipfs. The current system is hardly a system at all, instead it’s just a few lines of code in the go-bitswap repository that aggressively provides every block bitswap comes into contact with, which is not flexible. The goal of the new system is to give go-ipfs more control over which blocks are provided based on the context that the IPFS node is operating in. Then, from there, strategies can be implemented on top.

This work has gone through some stages. The original epic tracking issue was https://github.com/ipfs/go-ipfs/issues/5774 (mine that is, this work has attempts going back as far as 2015). My early attempts were a couple of failures and restarts. Some of that related work is here:

The first introduction of the new provider system occurred here: https://github.com/ipfs/go-ipfs/pull/6068. Due to some changes that were made in bitswap providing, the gateways took a long time to get to the root blocks in order to provide them. The fix was simply to provide root blocks immediately in a separate goroutine, alongside the providing for the other blocks. We used the new provider system in order to get it merged and generate feedback on a minimal provider setup.

Eventually though, it became clear that the provider system’s biggest challenge was that it was an “all or nothing” change, which was proving difficult, and so we decided to release the work under an experimental flag––admittedly something that should’ve happened a lot sooner, as it helped tremendously. From there, the following PR emerged (and merged!):

This PR, although simple looking, has the foundations for the new provider system while also fulfilling a need that, at the time, infra was asking for: to disable providing without disabling content routing. In this PR the go-bitswap workers are disabled if the StrategicProviding experimental flag is set to true, the first of such behavior. The plan is to layer on the providing complexity from here. The PRs from earlier in the year are at various stages of completion and all of them have more than this PR (#6292). They are worth reviewing for the variety of things they addressed. Given this new flexibility, I was curious what others needed so we could try and do small releases of specific providing behavior and so I tried reaching out https://github.com/ipfs/go-ipfs/issues/6221.

Towards the beginning of June ’19, ipfs-cluster asked for a simple version of the provider system to be extracted from go-ipfs (https://github.com/ipfs/go-ipfs/issues/6417) so that it could be used in ipfs-lite. This request resulted in the extraction of the provider system to https://github.com/ipfs/go-ipfs-provider. This was just before Team Week in Barcelona. A couple of weeks before Team Week I started only looking into content routing issues in libp2p as performance had degraded so much that the go-ipfs provider system wasn’t useful. After Team Week, I learned that what I was working on overlapped with what the Gateway Tiger Team was looking into, and so I tried (somewhat successfully) to help.

During the Gateway Tiger Team work, just after Team Week, someone proposed to introduce a simplistic roots only provide strategy to address some of the performance issues the gateways were experiencing. This change (https://github.com/ipfs/go-ipfs/pull/6472), while ultimately not merged, gives an idea of how roots will be implemented. The biggest difference is that this PR uses the “old” reprovider and forces a roots strategy, where we want to use the new reprovider with the strategy that was specified instead.

Notably, a release (0.4.21, I believe), team week, and the provider extraction all happened at the same time. So, some of that work had to be resolved in both the extracted and non-extracted versions. Then the extraction needed to be merged, which it was. Before team week and the content routing issues prior to even that, I was trying to get the following done:

I’m still going try and at least get PRs up for these things.


Recent Work Done and Relevant Outcomes


Technical Notes

Provider Queue

The provider queue keys are structured such that the sorting aspect of the queue is achieved using the lexicographical sorting of the keys in the datastore. To get the head of the queue, simply get first entry. Further, adding to the queue doesn't require crawling the queue. The keys are not parsed, they are only used for sorting.

The values are deserialized and provided.

Tracker

Structured in this way so that querying for the presence of a cid is fast.

The values are deserialized and reprovided. In theory, we could instead store the last provided timestamp and just parse the cid from the key. This allows us to skip reproviding cids that were recently enough provided.


Future Ideas for Improving Provider Strategies (and Projected Impact)


TODO