webtorrent / bittorrent-dht

🕸 Simple, robust, BitTorrent DHT implementation
https://webtorrent.io
MIT License
1.23k stars 203 forks source link

Implement dht.put() backoff algorithm #135

Open lmatteis opened 8 years ago

lmatteis commented 8 years ago

To keep things alive in the DHT store (get/put), it's suggested an algorithm such as (http://libtorrent.org/dht_rss.html#re-announcing) to avoid overload of the network:

  1. pick one random item (i) from the local repository (except items already announced this round)
  2. If all items in the local repository have been announced 2.1 terminate
  3. look up item i in the DHT
  4. If fewer than 8 nodes returned the item 4.1 announce i to the DHT 4.2 goto 1

However, I can't seem to find a way to see how many nodes returned the item after a get request. Is such functionality exposed?

lmatteis commented 8 years ago

Actually, this might be implemented as a dht.keepAlive(opts) method. From BEP44 (quite recent addition): https://github.com/bittorrent/bittorrent.org/blob/master/beps/bep_0044.rst#expiration

I'm willing to implement such functionality however I'm unsure how it would look like as an API. Some ideas:

// every 2 hours runs .put() based on above algorithm and executes callback?
dht.keepAlive(opts, (err, res) => console.log(err, res)) 

// perhaps the interval should exist at the app level (not inside the module)?
setInterval(() => dht.backoff(opts, () => dht.put(opts)), 7200000)

// or perhaps the functionality should be inside .put() itself?
// put() might not actually write to the DHT if the backoff algorithm doesn't allow it
setInterval(() => dht.put(opts), 7200000)

@substack @mafintosh @feross suggestions?

Also please let me know what you think of this recent BEP suggestion for updating torrents based on DHT mutable items: https://github.com/bittorrent/bittorrent.org/pull/35

the8472 commented 8 years ago

http://libtorrent.org/dht_rss.html#re-announcing is obsolete and should be ignored.

You can find the current algorithm at https://github.com/bittorrent/bittorrent.org/blob/master/beps/bep_0044.rst#expiration

feross commented 8 years ago

The third option seems the most usable to me, and it doesn't require new APIs to be added, or additional state to be tracked inside the DHT client. I like that it does the right thing by default.

We can add a { force: true } option to force a put to happen even when the backoff algorithm doesn't allow it.

lmatteis commented 8 years ago

@feross what about an opposite { backoff: true } so we don't break existing tests and use cases where users don't need multiple roundtrips?

feross commented 8 years ago

@lmatteis I still slightly prefer { force: true } since that way we do the right thing by default. We can bump the major version. But, I'm fine with { backoff: true } as well. @mafintosh - do you have thoughts?

lmatteis commented 8 years ago

I started implementing this here: https://github.com/lmatteis/bittorrent-dht/tree/backoff-algorithm

I'm a really bad coder 😄 so I'd appreciate feedback. Specifically, I decided to only alter _preput() and put the backoff logic in there. It seems to run a get based on the target to populate the local routing table. So I'm taking advantage of those round trips to also store the values: https://github.com/lmatteis/bittorrent-dht/blob/backoff-algorithm/client.js#L209

feross commented 8 years ago

Cool -- thanks for giving it a shot. I'm probably not the best person to review this, tbh as I didn't write this part of the codebase. I think @mafintosh or @substack ought to take a look before we merge this.

Few things I noticed in your work so far:

lmatteis commented 8 years ago

@feross yeah it's actually quite hard to test this behavior. I'd have to instantiate N nodes and find a way that I can control how many copies are being stored, in order to test for < 8 and > 8 copies.

And yeah MAX_COPIES is a typo should be 8.

mafintosh commented 8 years ago

Does this mean that .put will re-announce in the background? Neither .announce/.lookup does this now so I'm unsure if that would be confusing. The feature seems useful though!

lmatteis commented 8 years ago

@mafintosh if we just alter .put (option 3), the re-announce happens at the app level (not inside the module). So we opted to go with option 3 by either doing force : true or backoff: true.

This is the actual algorithm: https://github.com/bittorrent/bittorrent.org/pull/38/files

I'm simply taking advantage of the _preput() to implement this. But am unsure about how it's implemented. I think I'm missing point 2 of the algorithm: 2. The 8 nodes closest to the target key which are eligible for a store all have indicated they have the data, either by returning it or through the``seq``number.

the8472 commented 8 years ago

From skimming this implementation it looks like put and get are totally separate lookups.

In my own implementation a put is a non-lookup action that takes a finished get as an argument, that way the responses of the get can be reused.

feross commented 8 years ago

@lmatteis Re-announce should happen at the app level (option 3) to remain consistent with .announce/.lookup.

lmatteis commented 8 years ago

@substack any ideas how we can add tests for this?

feross commented 8 years ago

@lmatteis Can't you just create 8 + 1 clients, then have one of them do two put() calls in a row? Only the first put() call should actually be sent to the clients.

To make this easier to test, you could make the constant 8 configurable, so you can set it to something smaller, like 2, for the tests.

lmatteis commented 8 years ago

@feross right but how do I test whether that second put() isn't sent to the clients?

feross commented 8 years ago

@lmatteis You can spy on calls to the dht._onput() function. Not super elegant, but we don't have an event for when a put happens. So this works:

var onput = dht._onput // save the onput function
dht._onput = function () {
  t.pass('got `put` from remote peer')
  onput.apply(dht, arguments) // call the original onput function
}
lmatteis commented 8 years ago

Ok makes sense. I just didn't want to screw things up since I didn't see tests for private methods in the other tests.

lmatteis commented 8 years ago

Ok tests seem to pass. I'm just a little weary about this big change: https://github.com/feross/bittorrent-dht/compare/master...lmatteis:backoff-algorithm#diff-215b25d5bf5c0b4623ad1a2b62456f60L544

I'm essentially saving the value v and seq as well in the table for a target, since I need them to see whether we got multiple copies.

feross commented 8 years ago

@lmatteis Can you send a PR?

@mafintosh I'm gonna need your help to review this one 👍 I really haven't dug into the BEP44 stuff very deeply.

lmatteis commented 8 years ago

@feross ok, i also just added the second point of the algorithm. Test still pass but we need more for the backoff. For instance, I'm not storing the key/signature in the table for a key, hence can't check whether a copy is valid or not. I'll send a PR so we can better review where to store this stuff , as it could increase memory footprint.