nictuku / dht

Kademlia/Mainline DHT node in Go.
Other
826 stars 144 forks source link

improvements for crawling dht peers #59

Open btkador opened 6 years ago

btkador commented 6 years ago
nictuku commented 6 years ago

Let's talk about MaxSearchQueries. How does it help you compared to using a low value for NumTargetPeers? It feels like during your experiments it would have been enough to set NumTargetPeers to 5 or so? Then set a RateLimit too?

My concern is that the new features are overlapping with existing ones.

PassiveMode seems redundant with RateLimit for example.

btkador commented 6 years ago

With NumTargetPeers 5 you get 5 or maybe 20-30 peers after the first batch of get_peers and then the search stops, because needMorePeers is not triggered. There are no more peers learned until the program, which uses the library sends a new PeersRequest(), even if there are 500 or 5000 possible peers out there. That should be sufficient for a common torrent client, I guess.

In theroy, if you add a torrent with 0 peers, the search continues infinityfly and send more and more get_peers request packets to each new learned node, till RateLimit is reached and keeps filling up the whole RateLimit. The only reason that is not happening in real-world is, that some bogus nodes out there always send peers, no matter which hash you're asking for, and so 5 is always reached. I've just tested a bunch of random generated hashes and all of them immediately got peers. (I will investigate that further and evaluate if it's possible to detect rogue nodes, for my use case. I know there is bep42 in dht to do exactly that, but 98% of the dht nodes out there don't support it. I guess I need to disclose the nodeip or nodeid to the application for blacklisting them, based on the fact that they deliver peers for more than x hashes.)

If you set NumTargetPeers 5000 and RateLimit 100 (default), the library sends out an insane ammount of new get_peers packets but drops reply packets above RateLimit, but keeps sending new requests. Each reply thats sneaks thru the RateLimit with peers or nodes triggers new get_peers packets sent out, because needMorePeers is not reached. After a while you send out 100Mbits of get_peers packets and drop the replies. It's kind of a DDoS for your own server, because each small get_peers triggers a bigger response packet, filling up the network stack.

PassiveMode seems redundant with RateLimit for example.

As I understand the code RateLimit works on all incoming packets. It's also dropping reply packets to send out requests from us. So if you set RateLimit to 1, to process only 1 packet per second, dht would be real slow, because dropping all the responses.. while setting PassivMode only affects incoming queries but not replies.

nictuku commented 6 years ago

Torrent clients only need DHT to establish an initial connection to a swarm. After that they exchange peer information directly and the DHT is barely needed.

In my mind, you control outbound traffic using NumTargetPeers and inbound traffic using RateLimit.

The DHT is doing what you're asking it to do, get to 5000 peers as fast as possible. If you don't need that, set a lower number perhaps?

I still don't see how your proposed approach is better. In both cases you need to choose values carefully, but for the existing approach the default settings should work for most people.

On Thu, Aug 9, 2018, 7:25 PM Michael notifications@github.com wrote:

With NumTargetPeers 5 you get 5 or maybe 20-30 peers after the first batch of get_peers and then the search stops, because needMorePeers is not triggered. There are no more peers learned until the program, which uses the library sends a new PeersRequest(), even if there are 500 or 5000 possible peers out there. That should be sufficient for a common torrent client, I guess.

In theroy, if you add a torrent with 0 peers, the search continues infinityfly and send more and more get_peers request packets to each new learned node, till RateLimit is reached and keeps filling up the whole RateLimit. The only reason that is not happening in real-world is, that some bogus nodes out there always send peers, no matter which hash you're asking for, and so 5 is always reached. I've just tested a bunch of random generated hashes and all of them immediately got peers. (I will investigate that further and evaluate if it's possible to detect rogue nodes, for my use case. I know there is bep42 in dht to do exactly that, but 98% of the dht nodes out there don't support it. I guess I need to disclose the nodeip or nodeid to the application for blacklisting them, based on the fact that they deliver peers for more than x hashes.)

If you set NumTargetPeers 5000 and RateLimit 100 (default), the library sends out an insane ammount of new get_peers packets but drops reply packets above RateLimit, but keeps sending new requests. Each reply thats sneaks thru the RateLimit with peers or nodes triggers new get_peers packets sent out, because needMorePeers is not reached. After a while you send out 100Mbits of get_peers packets and drop the replies. It's kind of a DDoS for your own server, because each small get_peers triggers a bigger response packet, filling up the network stack.

PassiveMode seems redundant with RateLimit for example.

As I understand the code RateLimit works on all incoming packets. It's also dropping reply packets to send out requests from us. So if you set RateLimit to 1, to process only 1 packet per second, dht would be real slow, because dropping all the responses.. while setting PassivMode only affects incoming queries but not replies.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nictuku/dht/pull/59#issuecomment-411956327, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMY9uybe5GLmc5bvoK3hXJTiZHy0IwQks5uPO81gaJpZM4V1c2F .

btkador commented 6 years ago

In my mind, you control outbound traffic using NumTargetPeers and inbound traffic using RateLimit.

let's say you want a minimum of 250 peers and use NumTargetPeers for what it seems to be. The library will running a udp flood till NumTargetPeers is satisfied - if there are not enought peers out there, which is in my opinion not the best use of network bandwidth. RateLimit only limits inbound traffic.

I still don't see how your proposed approach is better. In both cases you need to choose values carefully, but for the existing approach the default settings should work for most people.

my parameters/features are all optional if not set.

nictuku commented 6 years ago

If you were to change the default behavior, what do you think it should be?

In other words, I think we should make things work out of the box without having to change knobs. Any ideas for how we can do that?

On Thu, Aug 9, 2018, 10:18 PM Michael notifications@github.com wrote:

In my mind, you control outbound traffic using NumTargetPeers and inbound traffic using RateLimit.

let's say you want a minimum of 250 peers and use NumTargetPeers for what it seems to be. The library will running a udp flood till NumTargetPeers is satisfied - if there are not enought peers out there, which is in my opinion not the best use of network bandwidth. RateLimit only limits inbound traffic.

I still don't see how your proposed approach is better. In both cases you need to choose values carefully, but for the existing approach the default settings should work for most people.

my parameters/features are all optional if not set.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nictuku/dht/pull/59#issuecomment-411977979, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMY9qsIbHQrPo1h_fOoBxS3Qm-o2u7sks5uPRfAgaJpZM4V1c2F .

btkador commented 6 years ago

just leave your default parameters like they are - what my branch already does. all my new parameters are default disabled.

nictuku commented 6 years ago

Can we reduce the number of changes and number of knobs from your proposed change? The DHT Config is already quite large. Can we maybe go with one or two new options?

I am not 100% against adding all your proposed changes - just trying to find a middle ground to reduce the API surface.

The more knobs we have the harder it is for people to find what values work well together.

Perhaps some of these things can be hidden constants, if they are so rarely changed?

On Thu, Aug 9, 2018, 10:27 PM Michael notifications@github.com wrote:

just leave your default parameters like they are - what my branch already does. all my new parameters are default disabled.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nictuku/dht/pull/59#issuecomment-411979150, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMY9rJBLddtWVz9Ts6JYKXRUr33pje6ks5uPRnNgaJpZM4V1c2F .

btkador commented 6 years ago

I need all the options and parameters adjustable for my program code.

As mentioned before, you don't need to merge any of my changes, since it seems to work well for everybody else as it is since >3 years. I just wanted to discuss my extentions and publish them back to the community 😉