iroh-sync: Store peers per docs

n0-computer / iroh

A toolkit for building distributed applications

https://iroh.computer

Apache License 2.0

2.58k stars 162 forks source link

iroh-sync: Store peers per docs #1524

Closed divagant-martian closed 1 year ago

divagant-martian commented 1 year ago

Questions to solve:

What peers to store
What data to store per peer

divagant-martian commented 1 year ago

From the questions to solve before storing this:

What peers to store: My thinking is that we should store the last n useful peers for the document. Where useful is defined as: A peer that has produced a successful sync event for the doc. I believe this is an appropriate approximation since those are peers with whom a future sync is more likely. This encapsulates nicely many concepts:
- There is reason to believe the peer is contactable since we have received data from them.
- There is reason to believe the peer is subscribed to the topic.
- There is reason to believe they are staying up to date.
- There is reason to believe the peer is a "low latency one" since they ended up in our optimized set of peers at some point.
What data to store per peer: We could either store their peer_id (peer id bytes, since this seems to be abstracted in sync), or both their peer id and peer data. Haven't finished looking into the code but I'm not sure the peer data reaches sync since it would be mostly irrelevant.

From the mem store this is a simple LRU cache, for the fs store I still need to think thoroughly how to store this to capture the idea of "last n recently useful peers". Pointers welcome

Could someone confirm if this would go in the instance store? maybe @Frando ?

Frando commented 1 year ago

I think I agree to what you outlined above! to 1: Yup, I think this makes sense. Maybe I would, in addition to that, always store the peers that were provided directly (in what currently is doc.start_sync or via doc.join(ticket)). And mark, per peer, whether its PeerKind::Discovered or PeerKind::UserProvided.

to 2: I'd say we should only store the PeerIds per document, because we store the address info already in the MagicEndpoint, right?

For storage: I'd think we would need to add methods in to the iroh_sync::store::Store traits. Maybe store_peer(namespace_id, peer_id, peer_kind, last_sync_timestamp) and get_peers(namespace_id) -> Self::PeerIteratoror something like that. The LRU would be handled in the store then. We'd have to add the capacity as a constructor argument then likely.

divagant-martian commented 1 year ago

Finally understood how to store this, that aside, I would suggest we wait to add explicit peers until we answer some questions about this:

how they are used on start up:
- let's say there are 5 explicit peers and 5 found peers, which peers should be used to join the network?
- should connections to explicit peers be tried to keep around? or are they useful only on start up. To put in the context of the example where I got ticket from a friend, would you expect to keep syncing from that peer even if better peers are found?)
- we would need to add the whole machinery to remove explicit peers from the doc since there is no way to prune those

Going to continue for now with the found peers and see how it goes