the8472 / mldht

Bittorrent Mainline DHT implementation in java
Mozilla Public License 2.0
149 stars 45 forks source link

Can't create a Peer lookup #4

Closed atomashpolskiy closed 7 years ago

atomashpolskiy commented 7 years ago

Hi @the8472 !

I'm in the process of integrating mldht into https://github.com/atomashpolskiy/bt/tree/dht-experimental. So far it's been working great, I've been able to download a complete torrent using DHT exclusively (no trackers, no PEX). So in the first place I'd like to thank you for developing this!

Currently I'm facing a weird problem. Coincidentally it appears that one of the bootstrap nodes is down at the moment, so this might be a part of the problem.

The problem is that when I create a PeerLookupTask via lbms.plugins.mldht.kad.DHT#createPeerLookup, I'm receiving null as a result. I've done some debugging, and it seems that it's because there are no active RPC servers (lbms.plugins.mldht.kad.RPCServerManager#activeServers is empty). There is one server, created via AddressUtils#getDefaultRoute, but it's considered unreachable due to nothing being received and timeOfLastReceiveCountChange being 0.

I wonder if there might be some kind of a race condition in DHT/RPC startup, because DHT hangs for a while in lbms.plugins.mldht.kad.DHT#resolveBootstrapAddresses, resulting in the exception:

java.net.UnknownHostException: router.silotis.us
        at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
        at java.net.InetAddress.getAllByName(InetAddress.java:1192)
        at java.net.InetAddress.getAllByName(InetAddress.java:1126)
        at lbms.plugins.mldht.kad.DHT.resolveBootstrapAddresses(DHT.java:957)
        at lbms.plugins.mldht.kad.DHT.routerBootstrap(DHT.java:1003)
        at lbms.plugins.mldht.kad.DHT.bootstrap(DHT.java:993)
        at lbms.plugins.mldht.kad.DHT.update(DHT.java:941)
        at lbms.plugins.mldht.kad.DHT.lambda$started$11(DHT.java:766)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Also it appears to me that the call to lbms.plugins.mldht.kad.DHT#resolveBootstrapAddresses should be omitted when router bootstrap is disabled in config (of course unless it has some important side effects that I'm not aware of).

Thanks again and pardon my poor language! :)

the8472 commented 7 years ago

The problem is that when I create a PeerLookupTask via lbms.plugins.mldht.kad.DHT#createPeerLookup, I'm receiving null as a result. I've done some debugging, and it seems that it's because there are no active RPC servers (lbms.plugins.mldht.kad.RPCServerManager#activeServers is empty).

createPeerLookup does a fallback to non-active servers if no active one can be found. So it sounds like there is no server at all, i.e. DHT initialization is not done yet.

You can use lbms.plugins.mldht.kad.DHT.addStatusListener(DHTStatusListener) to get notified when initialization is done. But maybe I should add a CompletionStage that gets resolved once an active server becomes available.

Also it appears to me that the call to lbms.plugins.mldht.kad.DHT#resolveBootstrapAddresses should be omitted when router bootstrap is disabled in config

Good point. But if you disable router bootstrap you will have to seed the DHT in some other way, e.g. from peers sending the PORT message by calling lbms.plugins.mldht.kad.DHT.addDHTNode(String, int)

atomashpolskiy commented 7 years ago

I'm pretty sure that the fallback is not used: https://github.com/the8472/mldht/blob/master/src/lbms/plugins/mldht/kad/DHT.java#L551

Good point. But if you disable router bootstrap you will have to seed the DHT in some other way, e.g. from peers sending the PORT message by calling lbms.plugins.mldht.kad.DHT.addDHTNode(String, int)

Yeah, feeding peers received from other sources into DHT is what I'm up to right now. Thanks for pointing out the API to use, this was going to be my next question :)

the8472 commented 7 years ago

Ah right, the fallback is only used for maintenance tasks such as adding new nodes. I'll add a callback then to notify when an active server becomes available.

atomashpolskiy commented 7 years ago

Just to be clear, I have a thread that periodically calls createPeerLookup, and each time it returns null. So I'm not sure that this is a premature calling problem, it rather seems like something broke/went out of sync completely during DHT startup... As I've said, this problem didn't appear until some of the bootstrap nodes went down and lbms.plugins.mldht.kad.DHT#resolveBootstrapAddresses began to take longer time than usual to execute.

the8472 commented 7 years ago

lbms.plugins.mldht.kad.DHT.printDiagnostics(PrintWriter) can provide a lot of diagnostic output which might help.

atomashpolskiy commented 7 years ago

Running with diagnostics, here's the result. Do you think there's any clue here?

==========================
DHT Diagnostics. Type IPV4_DHT
# of active servers / all servers: 0/1
-----------------------
Stats
Reachable node estimate: 2 (0.5)
DB Keys: 0
DB Items: 0
TX sum: 0 RX sum: 0
avg task time/avg 1st result time (ms): 10000/10000
Uptime: PT3M55.397Ss
RPC stats
### local RPCs
            Method                 REQ |                 RSP               Error             Timeout 

              PING                   0 |                   0                   0                   0 
         FIND_NODE                   0 |                   0                   0                   0 
         GET_PEERS                   0 |                   0                   0                   0 
     ANNOUNCE_PEER                   0 |                   0                   0                   0 
               GET                   0 |                   0                   0                   0 
               PUT                   0 |                   0                   0                   0 
 SAMPLE_INFOHASHES                   0 |                   0                   0                   0 
           UNKNOWN                   0 |                   0                   0                   0 

### remote RPCs
            Method                 REQ |                 RSP              Errors 

              PING                   0 |                   0                   0 
         FIND_NODE                   0 |                   0                   0 
         GET_PEERS                   0 |                   0                   0 
     ANNOUNCE_PEER                   0 |                   0                   0 
               GET                   0 |                   0                   0 
               PUT                   0 |                   0                   0 
 SAMPLE_INFOHASHES                   0 |                   0                   0 
           UNKNOWN                   0 |                   0                   0 
-----------------------
Routing table
buckets: 1 / entries: 0
all   num:0 rep:0 [Home]

-----------------------
RPC Servers
D1DBC867 01EA22A7 2202C67D FCC5AF84 13AD3EEF    bind: /192.168.1.2 consensus: null
rx: 0 tx: 0 active: 0 baseRTT: 10000 loss: 0,500000  loss (verified): 0,500000 uptime: PT3M23.101S
RTT stats (0samples)  mean:9975.0 median:9975.0 mode:9975.0 10tile:9975.0 90tile:9975.0
  9950 | 
  100% | 
-----------------------
Blacklist
{}
-----------------------
Lookup Cache
anchors (0):
buckets (1) / entries (0):

all entries: 0
-----------------------
Tasks
next id: 1
#### active: 
#### queued: 
the8472 commented 7 years ago

DHT has not been seeded with contacts yet -> 0 routing table entries -> no traffic -> server can't be active

atomashpolskiy commented 7 years ago

Yeah, right. This return statement is preventing DHT from bootstrapping normally when one of the router addresses can't be resolved: https://github.com/the8472/mldht/blob/master/src/lbms/plugins/mldht/kad/DHT.java#L966

the8472 commented 7 years ago

try with the current revision. you can use the CompletableFuture from the ServerManager to wait until ones become available.

atomashpolskiy commented 7 years ago

Working perfectly now, thanks a lot!

atomashpolskiy commented 7 years ago

You might also want to change the call to bootstrap in 'started' to be async btw