Question about constant MAX_ACTIVE_TASKS

the8472 / mldht

Bittorrent Mainline DHT implementation in java

Mozilla Public License 2.0

147 stars 45 forks source link

Question about constant MAX_ACTIVE_TASKS #2

Closed mfogelman closed 9 years ago

mfogelman commented 9 years ago

Hi, how are you doing?

May I ask you why the TaskManager would run up to 7 (MAX_ACTIVE_TASKS) tasks in parallel and enqueue the following? What's the constraint that took you to set it that way and not 8, 20 or 100 instead?

Thanks a lot in advance! Regards, Martin

the8472 commented 9 years ago

There is no reason for the exact number.

The reason for choosing a low number is the necessity for (secondary) throttling. Having many tasks active at once just causes tasks to compete for RPC slots (the primary throttling mechanism), which makes them take longer and could lead to token timeouts if congestion became extreme.

You also should take into account that this limit is multiplied by the number of active RPCServer instances, i.e. on a server-class machine with multi-homing it gets scaled up. On the other hand cheap NAT devices - which you generally find in a home environment - would just get overloaded with too much UDP traffic anyway.

Do you have observed any problems?

mfogelman commented 9 years ago

Hi, thanks a lot for the response!!

I didn't see any issues with that, only that when finding peers for several hashes is the primary usage of the library, it looks like it can take more tasks at the same time... I tried multiplying those numbers and got better and better results running on a server. I'll continue testing it.

Thanks a lot again! Regards, Martin

the8472 commented 9 years ago

If you're only interested in getting peers - as opposed to announcing - you can set PeerLookupTask#setFastTerminate(true), this will allow those tasks to terminate based on stall timeouts (based on connection latency) instead of hard timeouts (10s).

Additionally setLowPriority(true) might actually improve performance if you issue many tasks at once. Since it will decreases parallelism per task it will allow increased parallelism between multiple tasks.

mfogelman commented 9 years ago

Thank you so much! What you suggested is really great! It's flying now.

One last question: do you have any idea of what's the percentage of seeders found during a PeerLookupTask run and if there's a way to maximize it, regardless of the time that the task can take?

the8472 commented 9 years ago

Seeds specifically or any peer regardless of completion status?

mfogelman commented 9 years ago

Any peer, regardless of completion status...

the8472 commented 9 years ago

Hrrm... for large swarms with many peers it might be possible to get a few additional ones by simply running the lookup again since each response might contain a randomly sampled subset.

But I think the best way to get a good view of the swarm is to connect to them with the bittorrent protocol and use PEX.