the8472 / mldht

Bittorrent Mainline DHT implementation in java
Mozilla Public License 2.0
147 stars 45 forks source link

Ability to set the Node ID at startup #16

Closed kg6zvp closed 6 years ago

kg6zvp commented 6 years ago

I'm working on a project using your library and I'm attempting to set the Node ID in initialization, but I can't find anywhere in the code that it can be done.

Your help is greatly appreciated.

the8472 commented 6 years ago

What would be the use-case of setting the node ID? The spec requires it to be random or - in the case of BEP42 - to be derived from the public IP.

kg6zvp commented 6 years ago

Keeping the ID across sessions in the case of the randomly generated one.

the8472 commented 6 years ago

Just just supply a DHTConfiguration that returns a storage path and isPersistingID() == true

kg6zvp commented 6 years ago

I'm developing an application which already has a configuration file and may need to spin up new instances of DHT at runtime.

Is there a way to retrieve and store it programmatically? It would be extremely helpful

the8472 commented 6 years ago

Well, if you really want to, the Path supplied by the configuration can point to a virtual nio2 FileSystem which you can then serialize however you want. Personally I would consider that overkill though.

Just pass the same directory which you use to store your configuration file, or a subdirectory thereof. For some use-cases the required persistence size can exeed 20MB+ and contain northwards of 100k objects, many serialization libraries would choke on that, so direct filesystem writes are the most scalable solution.

Or you could just forgo ID permanence, it's not that important, it's more important to make sure your nodes are long-lived. Short-lived nodes are inefficient.

kg6zvp commented 6 years ago

Could you point me to the code that does ID generation or retrieval?

Unfortunately ID permanence is completely necessary for this use case.

the8472 commented 6 years ago

Well, you did not really describe your usecase, so I don't know if I actually want to support it.

kg6zvp commented 6 years ago

I'm trying to allow nodes to keep track of each other long term.

the8472 commented 6 years ago

And in turn, what would that be useful for? I want to get to the bottom of a potential XY problem.

DHT node IDs on their own are not particularly important, as the whole DHT is designed around the storage-targets being fixed but the individual nodes being volatile.

Hrm, maybe use BEP44 if you need some long-term constant stuff? Nodes can publish transient information (such as their current identity) under a fixed Key. If wouldn't even mind adding some "my pubkey is ..." to ping replies or some new identity query or whatever. I just really don't think the current node ID is something that should be relied on, especially in the face of BEP42. if other clients decide to start enforcing that I will follow suit, which means that I would have to break any API that allowed you to set it. So I don't want to provide such an API in the first place. Not to mention that people would abuse it for other things.

kg6zvp commented 6 years ago

Thank you for that reply. My only purpose is to publish an identity as you described and allow other nodes to find it based on that.

Could you describe in more detail what you mentioned regarding "publishing transient information under a fixed key"? If such information was published, how would you know which node published it? If you'd rather just link something so I don't waste more of your time, feel free. There just seems to be a strong disconnect between the abstract and the application.

the8472 commented 6 years ago

http://bittorrent.org/beps/bep_0044.html (this is also linked in the readme). It is implemented as PutTask and GetLookupTask

how would you know which node published it?

If that is important the node can simply include the information in the payload when publishing it, e.g. as serialized IP, port and optionally node ID. Update the published value under the same key when they change. If the only thing you need is the current IP and some port then you can simply select a random infohash and announce via AnnounceTask.

There just seems to be a strong disconnect between the abstract and the application.

Not sure if I understand. 😕 Again, the better you explain your actual, underlying motivation - several layers deep if needed - instead of the technical solution you think you need the better I can explain how this is best mapped onto DHT primitives.

kg6zvp commented 6 years ago

I'm experimenting with peer to peer connectivity and leveraging BitTorrent's network and technology to make those connections. Right now I'm just trying to get from static, unique ID to a connection to that peer.

the8472 commented 6 years ago

Well geez, that is simple.

Create a random Key, that's your ID. Then PeerLookupTask and AnnounceTask that ID. Now other nodes can PeerLookupTask that ID and find your IP/Port. They can also AnnounceTask themselves to that ID so you will know if someone wants to talk to your node (you call me vs. I call you).

DHT = Distributed Hash Table. It's a distributed version of java's HashMap. You store something under a hashed (effectively random) remote ID. Not your own node ID. Your own node is a hash bucket, not a key in the map.

the8472 commented 6 years ago

That is the basic BEP5 storage. If you want to publish more complex information than your current IP+Port then BEP44 storage is more appropriate.

kg6zvp commented 6 years ago

Thanks so much for your patience. I'm very glad I was way overthinking this.

I have tried this:

33  dhtManager.getMaster().getServerManager().awaitActiveServer();
34  log.i("Init successful");
35
36  Key k = Key.createRandomKey();
37  dhtManager.getMaster().announce(
38              dhtManager.getMaster().createPeerLookup(k.getHash()),
39              true,
40              openPort);

but I keep receiving this error:

Exception in thread "main" java.lang.NullPointerException
    at lbms.plugins.mldht.kad.DHT.announce(DHT.java:556)
    at bhc.Main.run(Main.java:37)
the8472 commented 6 years ago

You're on the right track, but you'll have to await completion of the lookup before starting the announce. I guess I should make it handle that automatically. And I would recommend doing this for IPv4 and IPv6 on dualstack hosts.

kg6zvp commented 6 years ago

So I would need to do this on each node (ipv4 and ipv6) even after addSiblings has been called to "pair" them?

Should awaiting completion look like this:

final PeerLookupTask lookupTask = dhtManager.getMaster().createPeerLookup(k.getHash());
lookupTask.addListener(new TaskListener() {
    @Override
    public void finished(Task t) {
        dhtManager.getMaster().announce(lookupTask, true, openPort);
    }
});
the8472 commented 6 years ago

Yes.

The pairing currently only exists to ease bootstrapping. I will eventually add some abstraction that bundles such high level tasks across nodes.

Oh, and you actually need to use the future from the servermanager.

kg6zvp commented 6 years ago

When you say I need to use the future from the ServerManager, do you mean that I need to pass the ServerManager instance to the future somehow? Is there a code sample I should look at for this?

I like your username, btw. Great reference.

the8472 commented 6 years ago

It's a future, the function doesn't wait, the return value does.

kg6zvp commented 6 years ago

I'm somewhat new to this, but this is what I am attempting:

Key k = Key.createRandomKey();
dhtManager.getMaster().getServerManager().awaitActiveServer().thenAccept(server -> {
    final PeerLookupTask lookupTask = server.getDHT().createPeerLookup(k.getHash());
    lookupTask.addListener(new TaskListener() {
        @Override
        public void finished(Task t) {
            server.getDHT().announce(lookupTask, true, 8000);
        }
    });
});

It just runs forever, but I'm not actually sure what's working and what's not.

Is this the correct approach?

the8472 commented 6 years ago

Yes