zerotier / libzt

Encrypted P2P sockets over ZeroTier
https://zerotier.com
Other
191 stars 55 forks source link

Adding `libzt` support to poco. #174

Open aarlt opened 2 years ago

aarlt commented 2 years ago

I think it would be awesome to add libzt support in poco. I'm currently working on that (in my free time), see https://github.com/aarlt/poco-libzt (but I just started working on this - nothing works yet).

I also created an issue in the poco project to potentially get some feedback & hints from its developers & users (see https://github.com/pocoproject/poco/issues/3540). I created this issue here to get some feedback & hints from the libzt community. I would appreciate any hints & feedback.

joseph-henry commented 2 years ago

Interesting, just make sure that the licenses for the third party libraries we depend on (in libzt/ext and ZeroTierOne/ext) are compatible with the poco project. There's some C sprinkled around as well so make sure they're ok with that. If you have any questions let me know and I'll try to help.

aarlt commented 2 years ago

I think I have a first simple working version. So far I only tested it with their EchoServer example, but it really just seem to work now! I still need to check the behaviour of the other examples, but at least this example seem to work quite good!

@joseph-henry libzt is really cool! :)

aarlt commented 2 years ago

So far I only adapted the code-paths that where related to MacOS - I saw in poco some Linux and Windows specific #ifdefs that I did not touch yet. That means for now it may only support MacOS.

aarlt commented 2 years ago

NOTE: You may need to adapt Net/CMakeLists.txt to change the paths pointing to libzt according to your environment. I tested this only on MacOS.

Building

git clone git@github.com:aarlt/poco-libzt
cd poco-libzt
mkdir b0
cd b0
cmake -DENABLE_LIBZT=on -DENABLE_TESTS=on ..
make
# start example
bin/EchoServer

Testing

# - install https://github.com/zerotier/ZeroTierOne

sudo zerotier-cli join 8850338390671cd6
telnet <IP address given by example> 9977
# <type anything now and press enter - an echo will be printed>

EchoServer example output:

# bin/EchoServer
Waiting for node to come online
Public identity (node ID) is 9e85ce0bd5
Joining network 8850338390671cd6
Don't forget to authorize this device in my.zerotier.com or the web API!
Waiting for join to complete
Waiting for address assignment from network
IP address on network 8850338390671cd6 is 10.242.157.67
...
sudo zerotier-cli join 8850338390671cd6
200 join OK
telnet 10.242.157.67 9977
# <type anything now and press enter - an echo will be printed>
aarlt commented 2 years ago

@joseph-henry I think there is even much a cooler thing than poco. I would say doing exactly the same for nodejs would be a game changer. What do you think?

aarlt commented 2 years ago

@joseph-henry I just noticed that only the "server" part seem to work nicely. If I try to connect (with poco), I'm not able to connect. I get an I/O error with error code 60, that seem to be ETIMEDOUT - Operation timed out. I'm not sure what could be the reason for this. I just started to investigate the reason for that. For now I only see that zts_bsd_connect returns -1 here. I guess I missed something during the initialisation. Maybe you see faster than me what the reason could be. I would appreciate any hint.

aarlt commented 2 years ago

@joseph-henry Hmm.. I have a question regarding the timeout parameter in zts_bsd_poll, in the header it's documented as How long this call should block, is it correct that this is in milliseconds?

aarlt commented 2 years ago

@joseph-henry if I change zts_bsd_socket to zts_socket and zts_bsd_connect to zts_connect(_sockfd, address.host().toString().c_str(), address.port(), 0); I'm able to connect. (worst case time needed to connect is often around 2 min) After the first successful connect, subsequent connections can be established without any problems and fast (no timeout). Somehow it looks like that the network is (sometimes) just not ready to allow opening connections. Any ideas how I could improve the worst-case connection time of 2 minutes? Basically I'm getting multiple I/O error exceptions during the first attempts to connect - normally up to 4. If I understood correctly, the 0 in zts_connect(_sockfd, address.host().toString().c_str(), address.port(), 0); means that the timeout is 30 seconds -> 4 * 30 = 2 minutes. I noticed if I set a timeout less than 30 seconds, I'm able to connect faster, but I normally also need more attempts. I also noticed that if I execute the program the first time (after not starting it for some minutes before that) I'm able to connect very often (maybe always?) without any timeouts. If I directly restart the program (after it just connected without any timeouts), then I need again multiple attempts to connect successfully. Is there some magic happening in the network controller? When is the best time to open the first connection so that the probability for timeouts is minimised?

joseph-henry commented 2 years ago

I think there is even much a cooler thing than poco. I would say doing exactly the same for nodejs would be a game changer. What do you think?

Not opposed to this, there was some effort some time ago. If you search the issues you might find it.

Hmm.. I have a question regarding the timeout parameter in zts_bsd_poll, in the header it's documented as How long this call should block, is it correct that this is in milliseconds?

Correct.

As for your last question about the connection process: ZT uses transport triggered links, so simply using zts_bsd_connect is likely to fail on the first few tries so it needs to be in a loop that is willing to re-try, which is why zts_connect exists to sort of paint over this subtlety. 2 minutes startup time is very terrible and not typical. If you can, check to see if you are relaying. If this libzt instance is talking to a traditional ZT node you can just use zerotier-cli peers.

aarlt commented 2 years ago

Thx @joseph-henry. With zerotier-cli peers I only see DIRECT peers with LEAF and PLANET roles, at least I don't see anything that seem to mention relaying.

However, I just noted that I needed to do another change that I forget to mention here. Poco defines a method void SocketImpl::setBlocking(bool flag) that was originally implemented as

    int arg = fcntl(F_GETFL);
    long flags = arg & ~O_NONBLOCK;
    if (!flag) flags |= O_NONBLOCK;
    (void) fcntl(F_SETFL, flags);

That I changed for libzt to

    int arg = fcntl(ZTS_F_GETFL);
    long flags = arg & ~ZTS_O_NONBLOCK;
    if (!flag) flags |= ZTS_O_NONBLOCK;
    (void) fcntl(ZTS_F_SETFL, flags);

However, I need to comment this code out to make everything work. With this code I get always I/O Error's. (Note: fcntl used here is a method of SocketImpl)

aarlt commented 2 years ago

I noticed one interesting thing: If I try to connect to the nodes own ip address - I'm notable to connect. But if I try to connect to another node (other IP address) everything just works.

@joseph-henry is there any limitations regarding this? what could be the reason for that?