Open lmatteis opened 8 years ago
Actually, this might be implemented as a dht.keepAlive(opts)
method. From BEP44 (quite recent addition): https://github.com/bittorrent/bittorrent.org/blob/master/beps/bep_0044.rst#expiration
I'm willing to implement such functionality however I'm unsure how it would look like as an API. Some ideas:
// every 2 hours runs .put() based on above algorithm and executes callback?
dht.keepAlive(opts, (err, res) => console.log(err, res))
// perhaps the interval should exist at the app level (not inside the module)?
setInterval(() => dht.backoff(opts, () => dht.put(opts)), 7200000)
// or perhaps the functionality should be inside .put() itself?
// put() might not actually write to the DHT if the backoff algorithm doesn't allow it
setInterval(() => dht.put(opts), 7200000)
@substack @mafintosh @feross suggestions?
Also please let me know what you think of this recent BEP suggestion for updating torrents based on DHT mutable items: https://github.com/bittorrent/bittorrent.org/pull/35
http://libtorrent.org/dht_rss.html#re-announcing is obsolete and should be ignored.
You can find the current algorithm at https://github.com/bittorrent/bittorrent.org/blob/master/beps/bep_0044.rst#expiration
The third option seems the most usable to me, and it doesn't require new APIs to be added, or additional state to be tracked inside the DHT client. I like that it does the right thing by default.
We can add a { force: true }
option to force a put to happen even when the backoff algorithm doesn't allow it.
@feross what about an opposite { backoff: true }
so we don't break existing tests and use cases where users don't need multiple roundtrips?
@lmatteis I still slightly prefer { force: true }
since that way we do the right thing by default. We can bump the major version. But, I'm fine with { backoff: true }
as well. @mafintosh - do you have thoughts?
I started implementing this here: https://github.com/lmatteis/bittorrent-dht/tree/backoff-algorithm
I'm a really bad coder 😄 so I'd appreciate feedback. Specifically, I decided to only alter _preput()
and put the backoff logic in there. It seems to run a get
based on the target to populate the local routing table. So I'm taking advantage of those round trips to also store the values: https://github.com/lmatteis/bittorrent-dht/blob/backoff-algorithm/client.js#L209
Cool -- thanks for giving it a shot. I'm probably not the best person to review this, tbh as I didn't write this part of the codebase. I think @mafintosh or @substack ought to take a look before we merge this.
Few things I noticed in your work so far:
MAX_COPIES
should not be changed to 0.@feross yeah it's actually quite hard to test this behavior. I'd have to instantiate N nodes and find a way that I can control how many copies are being stored, in order to test for < 8 and > 8 copies.
And yeah MAX_COPIES is a typo should be 8.
Does this mean that .put will re-announce in the background? Neither .announce/.lookup does this now so I'm unsure if that would be confusing. The feature seems useful though!
@mafintosh if we just alter .put (option 3), the re-announce happens at the app level (not inside the module). So we opted to go with option 3 by either doing force : true
or backoff: true
.
This is the actual algorithm: https://github.com/bittorrent/bittorrent.org/pull/38/files
I'm simply taking advantage of the _preput()
to implement this. But am unsure about how it's implemented. I think I'm missing point 2 of the algorithm: 2. The 8 nodes closest to the target key which are eligible for a store all have indicated they have the data, either by returning it or through the``seq``number.
From skimming this implementation it looks like put and get are totally separate lookups.
In my own implementation a put is a non-lookup action that takes a finished get as an argument, that way the responses of the get can be reused.
@lmatteis Re-announce should happen at the app level (option 3) to remain consistent with .announce/.lookup.
@substack any ideas how we can add tests for this?
@lmatteis Can't you just create 8 + 1 clients, then have one of them do two put() calls in a row? Only the first put() call should actually be sent to the clients.
To make this easier to test, you could make the constant 8 configurable, so you can set it to something smaller, like 2, for the tests.
@feross right but how do I test whether that second put() isn't sent to the clients?
@lmatteis You can spy on calls to the dht._onput()
function. Not super elegant, but we don't have an event for when a put
happens. So this works:
var onput = dht._onput // save the onput function
dht._onput = function () {
t.pass('got `put` from remote peer')
onput.apply(dht, arguments) // call the original onput function
}
Ok makes sense. I just didn't want to screw things up since I didn't see tests for private methods in the other tests.
Ok tests seem to pass. I'm just a little weary about this big change: https://github.com/feross/bittorrent-dht/compare/master...lmatteis:backoff-algorithm#diff-215b25d5bf5c0b4623ad1a2b62456f60L544
I'm essentially saving the value v
and seq
as well in the table for a target, since I need them to see whether we got multiple copies.
@lmatteis Can you send a PR?
@mafintosh I'm gonna need your help to review this one 👍 I really haven't dug into the BEP44 stuff very deeply.
@feross ok, i also just added the second point of the algorithm. Test still pass but we need more for the backoff. For instance, I'm not storing the key/signature in the table for a key, hence can't check whether a copy is valid or not. I'll send a PR so we can better review where to store this stuff , as it could increase memory footprint.
To keep things alive in the DHT store (get/put), it's suggested an algorithm such as (http://libtorrent.org/dht_rss.html#re-announcing) to avoid overload of the network:
However, I can't seem to find a way to see how many nodes returned the item after a
get
request. Is such functionality exposed?