Closed rkjdid closed 9 years ago
Nice. I wonder if we can write a test to exercise this problem. But that's probably not a blocker for the fix.
thx for the merge :)
not sure how to correctly expose this either. I cannot really get a grasp on the repercutions in a more standard use case, but my feeling is that it wouldn't have much impact if any at all. in a small self-contained network where everybody knows everybody almost instantly though, it makes the solution possible!
on a swarm of 100 clients and 1 seed on a 1MB torrent, it goes from a hasardous 30-50s, when it's not stuck forever (it's funny to see, the seed is taunting the swarm with a neat 0 peers and almost all leeches reach their maximum peer number :D), to a steady 10-12 sec with the patch
after a lengthy debugging session, we came up with 2 fixes regarding what peers go down to the
PeerRequestResult
pipeContext: we are running tests using a wip modded-taipei, no tracker, and had trouble making a local swarm behave correctly.
cfg.DhtRouter
is set to the seed with N leeches bootstrapping on it. There was some kind of race betweenannounce_peer
andget_peers
. If a dht node N1 receives an announce from N2 on a torrent, it adds N2 to thepeerStore
. After that if aget_peers
query returns to N1 with a peers list containing N2, it fails to add it to the local peer store (it's already there), then drops the N2 address, resulting in N1 and N2 never connecting to each other. The use case is somewhat specific since there is no routing table initially and only 1 DhtRouter bootstrapping the network, making the issue worse with fewer peers (no connections whatsoever to the seed/bootstrap with N<8 leeches, even though they correctly connect to each other)