When implementing the PeerStore, we did not consider cleannup strategies.
For instance, when multiaddrs expire and then we search peer routing to find the latest record for the peer because it previously supported a certain protocol.
We need to think further about these use cases and design a solution
JS Colo 2024 Rough discussion
* We still don't do any peerStore cleanup.
* we only add to the peerstore.
* different protocols (kad, etc) may need peers from peerstore for various cases..
only cleanup that makes sense:
* old multiaddrs
* old protocols
what we could do:
* when we read, we check if it's over some limit, and potentially delete multiaddrs
* if we had a unique error for this case, devs could handle this at the App layer
* make upgrade timeout error a specific one?
* Be more clear about which stage of the dial fails: at which point of dial failures do we need to actually do a findPeer
* upgrade failure should specify whether it's network failure or app level failure
* if network layer fails, we should allow implementations to run findPeer or peer-discovery on the peerId
* if we try to dial a peer with old multiaddrs, and they all fail, we will try a findPeer, if that fails, the dial fails (SHOULD we remove peerInfo in peerStore at this point?)
* if we are able to connect, we would run identify and update the peerInfo
what implementers could do:
* if concerned about too big peerStore, they could create a custom datastore, LRU caching strategy to throw away things not used within some time period, or whatever other strategy
peerstore cleanup: what does it mean?
peer store too big? implement peer store that deletes old records. implementers can do that.
partially expired records? we're not going to do that.. but we will improve error signaling of dialing, so implementers can handle at application layer. if networking layer fails, we will run findPeer using peer-routing (configurable with different protocols) which will result in updated peerInfo in the peerStore, and retry if peer-routing succeeds and multiaddrs are updated.
will dials be slower now? Only in the case of a networking failure (failure to reach peer's host machine at the networking level..). so aforementioned functionality should be configurable as a dial option
Action item:
ensure dial errors are differentiable between network level failures and everything else.
when a network level failure occurs, optionally perform a findPeer operation and retry dialing if new multiaddrs are retrieved (if we get the same ones... we can't do anything new)
Add configuration option to DialOptions to disable this functionality. (doing this work is the default in order to give users the greatest chance for success)
When implementing the PeerStore, we did not consider cleannup strategies.
For instance, when multiaddrs expire and then we search peer routing to find the latest record for the peer because it previously supported a certain protocol.
We need to think further about these use cases and design a solution
Context: https://github.com/libp2p/js-libp2p/pull/638#discussion_r425831536