holepunchto / hyperswarm

A distributed networking stack for connecting peers.
https://docs.holepunch.to
MIT License
1.04k stars 84 forks source link

Add noduplicate event in deduplication #61

Closed timgoeller closed 4 years ago

timgoeller commented 4 years ago

I want to connect nodes with hyperswarm, and only continue with further usage of the connection when I can be sure that there are no duplicate connections. To accomplish this, some kind of noduplicate event (in addition to duplicate) would be handy. That way one could wait for the event before continuing, while also knowing which connection to use.

mafintosh commented 4 years ago

Note that duplicates might not happen, it might take 20-30 min as well, depending on networks

timgoeller commented 4 years ago

Wouldn't the second connection after 20-30 minutes definitely be the duplicate? So that the first connection would have already triggered noduplicate. Or could the deduplication choose the first connection as the duplicate and close it?

mafintosh commented 4 years ago

Ya it sorts the connection type + dedup id and picks the lowest one to be the nondup one, in case of dups, so it might drop the first one. It's actually a really hard problem, to do this right, as @andrewosh can attest to :D.

We might loosen up the heuristic and do some sanity checks in the future like not dropping the original if it's proven to be alive and well after 20 min, but we've had some really annoying bugs around this in the last month, so we're keeping it simple for now.

timgoeller commented 4 years ago

Yeah, it feels like one of those things that sound simple, but gets worse and worse the more you think about it. :D

So the best way to work around duplication right now is to solve it on the layer above? (by acknowledging that data is sent at least once and ignoring data that was recieved more than once)

mafintosh commented 4 years ago

So with the current api you can somewhat easily guarantee that data is only sent once between two peers. @andrewosh and I went through this when making peer-sockets in beaker for this exact reason.

onFramedMessage(socket, function (message) {
  // accept messages from dups here also
  ...
})

function sendMessage (m)
  if (info.duplicate) return // never send to a duplicate
  sendFramedMessage(socket, m)
}

As soon as a dups appear if you use the deduplicate api before sending user data, hyperswarm will gracefully shutdown dups, meaning inflight messages arrive, so as long as you don't send new messages to dups it's pretty easy.

timgoeller commented 4 years ago

That's actually really nice, thanks! I think it would be helpful to include some (best practice)-examples in the documentation. I'll close the issue, since this should do the trick for me.

mafintosh commented 4 years ago

Ya, something we're working on :)