protocol / beyond-bitswap

Other
34 stars 9 forks source link

Thoughts on RFCBBL1201 #26

Open hannahhoward opened 3 years ago

hannahhoward commented 3 years ago

Codifying some thoughts I conveyed to @adlrocha in our conversation about RFCBBL1201.

  1. At a base level, I feel this is the right approach. Having a discovery layer, independent of Bitswap and Graphsync is key to mix/matching the protocols effectively. To effectively split a request between the two protocols, you have to know something about who has what.

  2. Some things to know about selectors, especially as it relates to WANTS: a. Absent cached information, determining if I can serve a CID + a selector carries a non-trivial CPU + disk cost. For example, to determine if I have CID QMxxx + a selector for the whole DAG beneath it, we must perform the traversal of the whole dag locally. Depending on the size of the DAG, this might carry a huge cost. As a baseline, we would probably never respond to an unbounded recursive selector in a WANT list purely for DOS protection. b. This then raise the question: how do we efficiently determine if we can serve a selector in a WANT? One obvious solution is to cache what selectors we've already served successfully. That probably isn't enough, and there are some addition questions we might look at: i. trying to develop logic, absent of data, about what selectors are contained in other selectors. For example, if we have already served a selector for a whole dag, we can safely say we can serve a selector for a subset of the dag. That example sounds obvious, but trying to determine the exact hueristic for this is something that may actually be quite complicated and involve some set theory :) ii. Perhaps we allow some kind of budgetting system for asking for selectors in WANTS + possibly a tit-for-tat system

  3. Once we get back information about a WANT for a selector, a big topic for exploration is how to split graphsync queries among multiple peers if multiiple peers have the selector. We may be able to do this by "splitting the selector" -- i.e. traversing different branchs of the DAG tree. However, it may also be more efficient to split at the block level. At the moment, selector traversal is deterministic. So if we have 3 peers, we might say to the one peer, that, given a selector traversal that results in blocks indexed 0...n, send blocks where index modulo 3 = 0, and then to another send blocks where index modulo = 1, and to the third send blocks where index modulo 3 = 2. Etc. This may be more efficient for a broader set of dag structures (as opposed to tree splitting which works best on a balanced, wide tree). There are lots of areas to explore here -- just flagging this as a non-trivial implementation detail that does not yet exist.