zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.38k stars 1.68k forks source link

Waiting for network configuration #1970

Open antoinefaure opened 1 year ago

antoinefaure commented 1 year ago

Hi,

I'm using Zerotier on an embedded Linux platform which is updated with full system images (i.e. the rootfs is erased and reflashed at every update). I'm backing up the content of the /var/lib/zerotier-one/ folder to keep my device's configuration, which seems to be working. After an update zerotier-cli status tells me I'm still online, and I can see my device on the web interface as being connected with an IP. The problem is, on the device zerotier seems to be stuck on waiting for the configuration for the network, and my IP address is not set on the device for the zerotier interface.

# ip a
[...]
4: zt5u4rycbq: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel qlen 1000
    link/ether c2:57:6f:fb:bc:ca brd ff:ff:ff:ff:ff:ff
    inet6 fe80::c057:6fff:fefb:bcca/64 scope link 
       valid_lft forever preferred_lft forever
# zerotier-cli status
200 info deviceid 1.10.6 ONLINE
# journalctl -u zerotier-one
...
Apr 20 09:39:01  my_devicezerotier-one[374]: requesting configuration for network XXX
Apr 20 09:40:06 my_device zerotier-one[374]: requesting configuration for network XXX
Apr 20 09:41:11 my_device zerotier-one[374]: requesting configuration for network XXX
Apr 20 09:41:34 my_device zerotier-one[374]: trying unknown path 121.98.56.212/9993 to 4a9c9fcf9d (packet d1eaeaa1ec2573c5 verb 8 local socket 367847649584 network 0000000000000000)
Apr 20 09:41:34 my_device zerotier-one[374]: learned new path 121.98.56.212/9993 to 4a9c9fcf9d (packet 7ccbc7964fde1cb8 local socket 367847649584 network 0000000000000000)
Apr 20 09:41:56 my_device zerotier-one[374]: learned new path 50.7.252.138/9993 to 62f865ae71 (packet 636135bdae68acd1 local socket 367847646640 network 0000000000000000)
Apr 20 09:42:06 my_device zerotier-one[374]: learned new path 163.47.166.49/29994 to 70e7ac4508 (packet 6084ee75bb4b7be8 local socket 367847646640 network 0000000000000000)
Apr 20 09:42:16 my_device zerotier-one[374]: requesting configuration for network XXX
Apr 20 09:42:36 my_device zerotier-one[374]: trying unknown path 163.47.166.49/37538 to 70e7ac4508 (packet a9e66b171241b6a7 verb 8 local socket 367847650288 network 0000000000000000)
Apr 20 09:42:36 my_device zerotier-one[374]: learned new path 163.47.166.49/37538 to 70e7ac4508 (packet 2aa24d7db1572c14 local socket 367847650288 network 0000000000000000)
Apr 20 09:43:21 my_device zerotier-one[374]: requesting configuration for network XXX

Before the update zerotier was working properly and the device had an IP address. No network changes before and after the update. The device is enabled on the web interface. I have tried rebooting with no success.

Is there any other configuration file I should be saving to be able to reconnect after an update ? Am I missing something ?

Thanks

antoinefaure commented 1 year ago

After more tests, it seems it has nothing to do with OTAs, a simple reboot had the same results :

It looks like there is a limit to the number of times a device can request a configuration in a certain amount of time, is that the case ? Or is it a bug ?

Thanks, Antoine

laduke commented 1 year ago

not familiar. if you restart zerotier-one does it start working?

antoinefaure commented 1 year ago

No, I've tried to restart zerotier-one or the device a few times with no success. But it seems to come back on it's own after a while, it's just odd that it doesn't reconnect immediately

antoinefaure commented 1 year ago

and the time it takes to reconnect seems to be quite random too. Earlier today it came back within an hour, now I've been waiting for a few hours, rebooted the device a few times, tried to disable/re-enable the device in the admin page, still no luck. The device stays stuck :

Apr 27 15:52:45 raspberrypi4-64 zerotier-one[396]: requesting configuration for network XXX
Apr 27 15:53:50 raspberrypi4-64 zerotier-one[396]: requesting configuration for network XXX
Apr 27 15:53:55 raspberrypi4-64 zerotier-one[396]: trying unknown path 103.254.1.161/23174 to b5fbbc5aaf (packet 128874ef9becd6a9 verb 8 local socket 367015552432 network 0000000000000000)
Apr 27 15:53:55 raspberrypi4-64 zerotier-one[396]: learned new path 103.254.1.161/23174 to b5fbbc5aaf (packet 9d8f5f8da4c1edfd local socket 367015552432 network 0000000000000000)
Apr 27 15:54:55 raspberrypi4-64 zerotier-one[396]: requesting configuration for network XXX
Apr 27 15:55:01 raspberrypi4-64 zerotier-one[396]: learned new path 206.83.103.28/61801 to fdc4b8e55d (packet 84cdc987bea98ed6 local socket 367015552432 network 0000000000000000)
Apr 27 15:55:01 raspberrypi4-64 zerotier-one[396]: learned new path 206.83.103.28/61801 to fdc4b8e55d (packet acfd4b375422655d local socket 367015556080 network 0000000000000000)
Apr 27 15:55:06 raspberrypi4-64 zerotier-one[396]: learned new path 104.194.8.134/9993 to cafe9efeb9 (packet 636ebd557e4e69fb local socket 367015556080 network 0000000000000000)
# ip a
[...]
4: zt5u4rycbq: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel qlen 1000
    link/ether c2:c6:07:8d:96:47 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::c0c6:7ff:fe8d:9647/64 scope link 
       valid_lft forever preferred_lft forever
laduke commented 1 year ago

The symptoms sounds like when you're behind a restrictive or double nat.

are there multiple insances of the same zerotier ID running?

antoinefaure commented 1 year ago

I don't think there is a double NAT on my network, but I'll check. It works well before I reboot / do an update though, the device connects instantly.

No just one instance running at the same time, and I'm using the release 1.10.6.

antoinefaure commented 1 year ago

I have eliminated the network configuration from the possible causes : I have 2 devices, one running an old version of our system and one running the new version. The old one reconnects to Zerotier after a reboot without any issue, while the new one struggles to do so. They are both on the same network.

There are a few things that change between the 2 systems :

Could it be one of this 2 things that causes the re-connection issues ?

Thanks.

laduke commented 1 year ago

There have been some similar reports. Maybe they were on the discuss.zerotier.com as well, but we haven't been able to reproduce the issue. systemd-networkd shouldn't be an issue.

bostick commented 1 year ago

Is ZT_SSO_SUPPORTED=0 doing the correct thing?

ZT_SSO_SUPPORTED is only tested with: #ifdef ZT_SSO_SUPPORTED in Constants.hpp, so it is important to only define it if turning it on. Doing ZT_SSO_SUPPORTED=0 will act as turning on ZT_SSO_SUPPORTED.

glimberg commented 1 year ago

@bostick That is a separate concern from the post, but yes it does work. Compiling with make ZT_SSO_SUPPORTED=0 does indeed disable SSO

bostick commented 1 year ago

@glimberg Could you point to where this is handled? I'm not seeing.

glimberg commented 1 year ago

@bostick in the makefile

https://github.com/zerotier/ZeroTierOne/blob/785a12182579277b7b1b0453b1acfbeaa0c325c2/make-linux.mk#L283

bostick commented 1 year ago

Ah, I see. Thanks

antoinefaure commented 1 year ago

Any idea what could be the cause of this then ? Are there some additional logs I could enable to get more details ? Again, it really seems to be software related as I have 2 devices on the same network and one is working well. The only thing that differs is that the one that is working properly is using raspbian whereas the one that has issues is running a custom Linux distribution (Yocto). Different kernels, different versions of libraries and so on.

laduke commented 1 year ago

Can you look at sudo zerotier-cli info -j and see if "listeningOn" or anything else is different between working and not? also sudo zerotier-cli listnetworks -j

laduke commented 1 year ago

Does your distro start calling zerotier-cli to do anything, like join a network or get info as soon as it boots?

If you don't persist /var/lib/zerotier-one/peers.d or /var/lib/zerotier-one/networks.d between reboots, does that make a difference?

antoinefaure commented 1 year ago

Thanks for helping and investigating!

We have found a workaround though, after running zerotier-cli leave XXX and then zerotie-cli join XXX the device is back online. But this needs to be done at every reboot.

laduke commented 1 year ago

Thank you. That is a good tip.

chadrockey commented 1 year ago

I see this issue too, OTA system with persistent zerotier configuration, fails on OTA and on reboots.

Everything appears to work, connect, and list peers, but ifconfig shows no inet (ipv4) address for the zerotier networkd evice, while zerotier-cli listnetworks shows the ip address that's supposed to be assigned.

ztqu3aiybc: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2800 inet6 fe80::105a:cbff:fe0b:d04 prefixlen 64 scopeid 0x20 ether 12:5a:cb:0b:0d:04 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 2738 (2.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

antoinefaure commented 1 year ago

@chadrockey For us having a service that leaves and rejoins the zerotier network after booting 'solved' the issue. This is a rather ugly workaround, but it will unfortunately have to do the trick until we move on to a more reliable vpn

laduke commented 1 year ago

This may be fixed in 1.12.x @antoinefaure. Let us know!

antoinefaure commented 11 months ago

Hey @laduke, thanks for the ping. I've tested 1.12.2 and it's seems worst than 1.10.6 I was previously running: the ZT interface doesn't get an IP, and I get

sudo zerotier-cli status
401 status {}

I disabled my service that does the leave & join after booting, but got the same results when I kept it. The web interface shows my device as online, and I see lots of logs learned new path [...] to [...]. There might be an issue on my side, but I have no idea what it could be as I haven't changed anything except for the version of zerotier.

I still install it from source, using the following options (although I will remove the first one):

ZT_DEBUG=1 \
STRIP=echo \
ZT_SSO_SUPPORTED=0
laduke commented 11 months ago

That means it's not accepting the auth. Are you now seeing this issue? #2151

antoinefaure commented 11 months ago

@laduke I don't think this is the same problem as the issue you've mentioned. First, my Zerotier interface ends up (most of the time) with no IP address assigned to it. The web manager however shows that the client is contacting the server (Last seen less than a minute ago) Then, even though sudo zerotier-cli status returns a 401 status {}, when I run curl -v http://localhost:9993/status -4HX-ZT1-Auth:MYTOKEN I get:

* STATE: INIT => CONNECT handle 0x5588d5bed0; line 1834 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* family0 == v4, family1 == v6
*   Trying 127.0.0.1:9993...
* STATE: CONNECT => CONNECTING handle 0x5588d5bed0; line 1895 (connection #0)
* Connected to localhost (127.0.0.1) port 9993 (#0)
* STATE: CONNECTING => PROTOCONNECT handle 0x5588d5bed0; line 2027 (connection #0)
* STATE: PROTOCONNECT => DO handle 0x5588d5bed0; line 2050 (connection #0)
> GET /status HTTP/1.1
> Host: localhost:9993
> User-Agent: curl/7.82.0
> Accept: */*
> X-ZT1-Auth:MYTOKEN
> 
* STATE: DO => DID handle 0x5588d5bed0; line 2146 (connection #0)
* STATE: DID => PERFORMING handle 0x5588d5bed0; line 2265 (connection #0)
* Mark bundle as not supporting multiuse
* HTTP 1.1 or later with persistent connection
< HTTP/1.1 200 OK
< Content-Length: 1015
< Content-Type: application/json
< Keep-Alive: timeout=5, max=5
< 
* STATE: PERFORMING => DONE handle 0x5588d5bed0; line 2464 (connection #0)
* multi_done: status: 0 prem: 0 done: 0
* Connection #0 to host localhost left intact
* Expire cleared (transfer 0x5588d5bed0)
{"address":"xxx","clock":1699837008026,"config":{"settings":{"allowTcpFallbackRelay":true,"forceTcpRelay":false,"listeningOn":["10.72.84.5/9993","10.10.0.57/9993","10.72.84.5/35113","10.10.0.57/35113","10.72.84.5/64845","10.10.0.57/64845"],"portMappingEnabled":true,"primaryPort":9993,"secondaryPort":35113,"softwareUpdate":"disable","softwareUpdateChannel":"release","surfaceAddresses":["103.125.220.62/1031","103.125.220.62/1025","103.125.220.62/1056","103.125.220.62/35113","103.125.220.62/1057","103.125.220.62/64845","10.10.0.57/64845","10.10.0.57/35113","10.10.0.57/9993","10.72.84.5/9993","10.72.84.5/35113","10.72.84.5/64845"],"tertiaryPort":64845}},"online":true,"planetWorldId":149604618,"planetWorldTimestamp":1644592324813,"publicIdentity":"xxx","tcpFallbackActive":false,"version":"1.12.2","versionBuild":0,"versionMajor":1,"versionMinor":12,"versionRev":2}

In the logs I see many trying new path & learned new path. When I run sudo zerotier-cli status I also see

Nov 13 14:02:06 AntoineDev zerotier-one[672]: ================================
Nov 13 14:02:06 AntoineDev zerotier-one[672]: GET HTTP/1.1 /status
Nov 13 14:02:06 AntoineDev zerotier-one[672]: LOCAL_ADDR: 127.0.0.1
Nov 13 14:02:06 AntoineDev zerotier-one[672]: LOCAL_PORT: 9993
Nov 13 14:02:06 AntoineDev zerotier-one[672]: REMOTE_ADDR: 127.0.0.1
Nov 13 14:02:06 AntoineDev zerotier-one[672]: REMOTE_PORT: 52684
Nov 13 14:02:06 AntoineDev zerotier-one[672]: --------------------------------
Nov 13 14:02:06 AntoineDev zerotier-one[672]: 401 HTTP/1.1
Nov 13 14:02:06 AntoineDev zerotier-one[672]: Content-Length: 2
Nov 13 14:02:06 AntoineDev zerotier-one[672]: Content-Type: application/json
Nov 13 14:02:06 AntoineDev zerotier-one[672]: Keep-Alive: timeout=5, max=5
Nov 13 14:02:06 AntoineDev zerotier-one[672]: {}

However when I run the curl command, I get the following:

Nov 13 14:03:27 AntoineDev zerotier-one[672]: ================================
Nov 13 14:03:27 AntoineDev zerotier-one[672]: GET HTTP/1.1 /status
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Accept: */*
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Host: localhost:9993
Nov 13 14:03:27 AntoineDev zerotier-one[672]: LOCAL_ADDR: 127.0.0.1
Nov 13 14:03:27 AntoineDev zerotier-one[672]: LOCAL_PORT: 9993
Nov 13 14:03:27 AntoineDev zerotier-one[672]: REMOTE_ADDR: 127.0.0.1
Nov 13 14:03:27 AntoineDev zerotier-one[672]: REMOTE_PORT: 52688
Nov 13 14:03:27 AntoineDev zerotier-one[672]: User-Agent: curl/7.82.0
Nov 13 14:03:27 AntoineDev zerotier-one[672]: X-ZT1-Auth: MYTOKEN
Nov 13 14:03:27 AntoineDev zerotier-one[672]: --------------------------------
Nov 13 14:03:27 AntoineDev zerotier-one[672]: 200 HTTP/1.1
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Content-Length: 1015
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Content-Type: application/json
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Keep-Alive: timeout=5, max=5
Nov 13 14:03:27 AntoineDev zerotier-one[672]: {"address":"xxx","clock":1699837407797,"config":{"settings":{"allowTcpFallbackRelay":true,"forceTcpRelay":false,"listeningOn":["10.72.84.5/9993","10.10.0.57/9993","10.72.84.5/35113","10.10.0.57/35113","10.72.84.5/64845","10.10.0.57/64845"],"portMappingEnabled":true,"primaryPort":9993,"secondaryPort":35113,"softwareUpdate":"disable","softwareUpdateChannel":"release","surfaceAddresses":["103.125.220.62/1057","103.125.220.62/1031","103.125.220.62/1025","103.125.220.62/1056","103.125.220.62/35113","103.125.220.62/64845","10.10.0.57/64845","10.10.0.57/35113","10.10.0.57/9993","10.72.84.5/9993","10.72.84.5/35113","10.72.84.5/64845"],"tertiaryPort":64845}},"online":true,"planetWorldId":149604618,"planetWorldTimestamp":1644592324813,"publicIdentity":"xxx","tcpFallbackActive":false,"version":"1.12.2","versionBuild":0,"versionMajor":1,"versionMinor":12,"versionRev":2}

So it looks like my initial problem hasn't been fixed, and that there is at least a new one.

antoinefaure commented 8 months ago

Hi @laduke Any update ?

chadrockey commented 6 months ago

@laduke @antoinefaure I updated to 1.12.2 and everything appears to be fixed and works well without the need for a leave/join script.