Open antoinefaure opened 1 year ago
After more tests, it seems it has nothing to do with OTAs, a simple reboot had the same results :
zerotier-cli
says the device is connected, the web page says the same thing. Yet the device doesn't have any IP address for the Zerotier interface and can't reach any device inside this network.It looks like there is a limit to the number of times a device can request a configuration in a certain amount of time, is that the case ? Or is it a bug ?
Thanks, Antoine
not familiar. if you restart zerotier-one does it start working?
No, I've tried to restart zerotier-one or the device a few times with no success. But it seems to come back on it's own after a while, it's just odd that it doesn't reconnect immediately
and the time it takes to reconnect seems to be quite random too. Earlier today it came back within an hour, now I've been waiting for a few hours, rebooted the device a few times, tried to disable/re-enable the device in the admin page, still no luck. The device stays stuck :
Apr 27 15:52:45 raspberrypi4-64 zerotier-one[396]: requesting configuration for network XXX
Apr 27 15:53:50 raspberrypi4-64 zerotier-one[396]: requesting configuration for network XXX
Apr 27 15:53:55 raspberrypi4-64 zerotier-one[396]: trying unknown path 103.254.1.161/23174 to b5fbbc5aaf (packet 128874ef9becd6a9 verb 8 local socket 367015552432 network 0000000000000000)
Apr 27 15:53:55 raspberrypi4-64 zerotier-one[396]: learned new path 103.254.1.161/23174 to b5fbbc5aaf (packet 9d8f5f8da4c1edfd local socket 367015552432 network 0000000000000000)
Apr 27 15:54:55 raspberrypi4-64 zerotier-one[396]: requesting configuration for network XXX
Apr 27 15:55:01 raspberrypi4-64 zerotier-one[396]: learned new path 206.83.103.28/61801 to fdc4b8e55d (packet 84cdc987bea98ed6 local socket 367015552432 network 0000000000000000)
Apr 27 15:55:01 raspberrypi4-64 zerotier-one[396]: learned new path 206.83.103.28/61801 to fdc4b8e55d (packet acfd4b375422655d local socket 367015556080 network 0000000000000000)
Apr 27 15:55:06 raspberrypi4-64 zerotier-one[396]: learned new path 104.194.8.134/9993 to cafe9efeb9 (packet 636ebd557e4e69fb local socket 367015556080 network 0000000000000000)
# ip a
[...]
4: zt5u4rycbq: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel qlen 1000
link/ether c2:c6:07:8d:96:47 brd ff:ff:ff:ff:ff:ff
inet6 fe80::c0c6:7ff:fe8d:9647/64 scope link
valid_lft forever preferred_lft forever
The symptoms sounds like when you're behind a restrictive or double nat.
are there multiple insances of the same zerotier ID running?
I don't think there is a double NAT on my network, but I'll check. It works well before I reboot / do an update though, the device connects instantly.
No just one instance running at the same time, and I'm using the release 1.10.6
.
I have eliminated the network configuration from the possible causes : I have 2 devices, one running an old version of our system and one running the new version. The old one reconnects to Zerotier after a reboot without any issue, while the new one struggles to do so. They are both on the same network.
There are a few things that change between the 2 systems :
ZT_SSO_SUPPORTED=0
is set to avoid setting up rust, cargo etc and make cross compilation easier.Could it be one of this 2 things that causes the re-connection issues ?
Thanks.
There have been some similar reports. Maybe they were on the discuss.zerotier.com as well, but we haven't been able to reproduce the issue. systemd-networkd shouldn't be an issue.
Is ZT_SSO_SUPPORTED=0
doing the correct thing?
ZT_SSO_SUPPORTED
is only tested with:
#ifdef ZT_SSO_SUPPORTED
in Constants.hpp, so it is important to only define it if turning it on.
Doing ZT_SSO_SUPPORTED=0
will act as turning on ZT_SSO_SUPPORTED
.
@bostick That is a separate concern from the post, but yes it does work. Compiling with make ZT_SSO_SUPPORTED=0
does indeed disable SSO
@glimberg Could you point to where this is handled? I'm not seeing.
Ah, I see. Thanks
Any idea what could be the cause of this then ? Are there some additional logs I could enable to get more details ? Again, it really seems to be software related as I have 2 devices on the same network and one is working well. The only thing that differs is that the one that is working properly is using raspbian whereas the one that has issues is running a custom Linux distribution (Yocto). Different kernels, different versions of libraries and so on.
Can you look at sudo zerotier-cli info -j
and see if "listeningOn" or anything else is different between working and not?
also sudo zerotier-cli listnetworks -j
Does your distro start calling zerotier-cli
to do anything, like join a network or get info as soon as it boots?
If you don't persist /var/lib/zerotier-one/peers.d or /var/lib/zerotier-one/networks.d between reboots, does that make a difference?
Thanks for helping and investigating!
sudo zerotier-cli info -j
shows a small difference in the listeningOn section, some ports are different between the 2.
The device that is working is listening on the ports 9993, 54900 and 34038.
The device that is not working is listening on 9993, 27692, 65172
The rest looks similar.
sudo zerotier-cli listnetworks -j
gives similar results on both nodes. One thing that is very strange though, the device that isn't connected still has the correct IP address under "assignedAddresses" even though the ip is not set to the interface (see the result of the ip a
command a few posts earlier).
No we are not calling zerotier-cli when booting, we are just using the systemd service provided in debian/zerotier-one.service
I've tried removing the peers.d
and networks.d
folders and rebooting, it didn't fix the issue.
We have found a workaround though, after running zerotier-cli leave XXX
and then zerotie-cli join XXX
the device is back online. But this needs to be done at every reboot.
Thank you. That is a good tip.
I see this issue too, OTA system with persistent zerotier configuration, fails on OTA and on reboots.
Everything appears to work, connect, and list peers, but ifconfig
shows no inet (ipv4) address for the zerotier networkd evice, while zerotier-cli listnetworks shows the ip address that's supposed to be assigned.
ztqu3aiybc: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2800 inet6 fe80::105a:cbff:fe0b:d04 prefixlen 64 scopeid 0x20 ether 12:5a:cb:0b:0d:04 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 2738 (2.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
@chadrockey For us having a service that leaves and rejoins the zerotier network after booting 'solved' the issue. This is a rather ugly workaround, but it will unfortunately have to do the trick until we move on to a more reliable vpn
This may be fixed in 1.12.x @antoinefaure. Let us know!
Hey @laduke, thanks for the ping. I've tested 1.12.2 and it's seems worst than 1.10.6 I was previously running: the ZT interface doesn't get an IP, and I get
sudo zerotier-cli status
401 status {}
I disabled my service that does the leave
& join
after booting, but got the same results when I kept it.
The web interface shows my device as online, and I see lots of logs learned new path [...] to [...]
.
There might be an issue on my side, but I have no idea what it could be as I haven't changed anything except for the version of zerotier.
I still install it from source, using the following options (although I will remove the first one):
ZT_DEBUG=1 \
STRIP=echo \
ZT_SSO_SUPPORTED=0
That means it's not accepting the auth. Are you now seeing this issue? #2151
@laduke I don't think this is the same problem as the issue you've mentioned. First, my Zerotier interface ends up (most of the time) with no IP address assigned to it. The web manager however shows that the client is contacting the server (Last seen less than a minute ago)
Then, even though sudo zerotier-cli status
returns a 401 status {}
, when I run curl -v http://localhost:9993/status -4HX-ZT1-Auth:MYTOKEN
I get:
* STATE: INIT => CONNECT handle 0x5588d5bed0; line 1834 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* family0 == v4, family1 == v6
* Trying 127.0.0.1:9993...
* STATE: CONNECT => CONNECTING handle 0x5588d5bed0; line 1895 (connection #0)
* Connected to localhost (127.0.0.1) port 9993 (#0)
* STATE: CONNECTING => PROTOCONNECT handle 0x5588d5bed0; line 2027 (connection #0)
* STATE: PROTOCONNECT => DO handle 0x5588d5bed0; line 2050 (connection #0)
> GET /status HTTP/1.1
> Host: localhost:9993
> User-Agent: curl/7.82.0
> Accept: */*
> X-ZT1-Auth:MYTOKEN
>
* STATE: DO => DID handle 0x5588d5bed0; line 2146 (connection #0)
* STATE: DID => PERFORMING handle 0x5588d5bed0; line 2265 (connection #0)
* Mark bundle as not supporting multiuse
* HTTP 1.1 or later with persistent connection
< HTTP/1.1 200 OK
< Content-Length: 1015
< Content-Type: application/json
< Keep-Alive: timeout=5, max=5
<
* STATE: PERFORMING => DONE handle 0x5588d5bed0; line 2464 (connection #0)
* multi_done: status: 0 prem: 0 done: 0
* Connection #0 to host localhost left intact
* Expire cleared (transfer 0x5588d5bed0)
{"address":"xxx","clock":1699837008026,"config":{"settings":{"allowTcpFallbackRelay":true,"forceTcpRelay":false,"listeningOn":["10.72.84.5/9993","10.10.0.57/9993","10.72.84.5/35113","10.10.0.57/35113","10.72.84.5/64845","10.10.0.57/64845"],"portMappingEnabled":true,"primaryPort":9993,"secondaryPort":35113,"softwareUpdate":"disable","softwareUpdateChannel":"release","surfaceAddresses":["103.125.220.62/1031","103.125.220.62/1025","103.125.220.62/1056","103.125.220.62/35113","103.125.220.62/1057","103.125.220.62/64845","10.10.0.57/64845","10.10.0.57/35113","10.10.0.57/9993","10.72.84.5/9993","10.72.84.5/35113","10.72.84.5/64845"],"tertiaryPort":64845}},"online":true,"planetWorldId":149604618,"planetWorldTimestamp":1644592324813,"publicIdentity":"xxx","tcpFallbackActive":false,"version":"1.12.2","versionBuild":0,"versionMajor":1,"versionMinor":12,"versionRev":2}
In the logs I see many trying new path
& learned new path
. When I run sudo zerotier-cli status
I also see
Nov 13 14:02:06 AntoineDev zerotier-one[672]: ================================
Nov 13 14:02:06 AntoineDev zerotier-one[672]: GET HTTP/1.1 /status
Nov 13 14:02:06 AntoineDev zerotier-one[672]: LOCAL_ADDR: 127.0.0.1
Nov 13 14:02:06 AntoineDev zerotier-one[672]: LOCAL_PORT: 9993
Nov 13 14:02:06 AntoineDev zerotier-one[672]: REMOTE_ADDR: 127.0.0.1
Nov 13 14:02:06 AntoineDev zerotier-one[672]: REMOTE_PORT: 52684
Nov 13 14:02:06 AntoineDev zerotier-one[672]: --------------------------------
Nov 13 14:02:06 AntoineDev zerotier-one[672]: 401 HTTP/1.1
Nov 13 14:02:06 AntoineDev zerotier-one[672]: Content-Length: 2
Nov 13 14:02:06 AntoineDev zerotier-one[672]: Content-Type: application/json
Nov 13 14:02:06 AntoineDev zerotier-one[672]: Keep-Alive: timeout=5, max=5
Nov 13 14:02:06 AntoineDev zerotier-one[672]: {}
However when I run the curl
command, I get the following:
Nov 13 14:03:27 AntoineDev zerotier-one[672]: ================================
Nov 13 14:03:27 AntoineDev zerotier-one[672]: GET HTTP/1.1 /status
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Accept: */*
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Host: localhost:9993
Nov 13 14:03:27 AntoineDev zerotier-one[672]: LOCAL_ADDR: 127.0.0.1
Nov 13 14:03:27 AntoineDev zerotier-one[672]: LOCAL_PORT: 9993
Nov 13 14:03:27 AntoineDev zerotier-one[672]: REMOTE_ADDR: 127.0.0.1
Nov 13 14:03:27 AntoineDev zerotier-one[672]: REMOTE_PORT: 52688
Nov 13 14:03:27 AntoineDev zerotier-one[672]: User-Agent: curl/7.82.0
Nov 13 14:03:27 AntoineDev zerotier-one[672]: X-ZT1-Auth: MYTOKEN
Nov 13 14:03:27 AntoineDev zerotier-one[672]: --------------------------------
Nov 13 14:03:27 AntoineDev zerotier-one[672]: 200 HTTP/1.1
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Content-Length: 1015
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Content-Type: application/json
Nov 13 14:03:27 AntoineDev zerotier-one[672]: Keep-Alive: timeout=5, max=5
Nov 13 14:03:27 AntoineDev zerotier-one[672]: {"address":"xxx","clock":1699837407797,"config":{"settings":{"allowTcpFallbackRelay":true,"forceTcpRelay":false,"listeningOn":["10.72.84.5/9993","10.10.0.57/9993","10.72.84.5/35113","10.10.0.57/35113","10.72.84.5/64845","10.10.0.57/64845"],"portMappingEnabled":true,"primaryPort":9993,"secondaryPort":35113,"softwareUpdate":"disable","softwareUpdateChannel":"release","surfaceAddresses":["103.125.220.62/1057","103.125.220.62/1031","103.125.220.62/1025","103.125.220.62/1056","103.125.220.62/35113","103.125.220.62/64845","10.10.0.57/64845","10.10.0.57/35113","10.10.0.57/9993","10.72.84.5/9993","10.72.84.5/35113","10.72.84.5/64845"],"tertiaryPort":64845}},"online":true,"planetWorldId":149604618,"planetWorldTimestamp":1644592324813,"publicIdentity":"xxx","tcpFallbackActive":false,"version":"1.12.2","versionBuild":0,"versionMajor":1,"versionMinor":12,"versionRev":2}
So it looks like my initial problem hasn't been fixed, and that there is at least a new one.
Hi @laduke Any update ?
@laduke @antoinefaure I updated to 1.12.2 and everything appears to be fixed and works well without the need for a leave/join script.
Hi,
I'm using Zerotier on an embedded Linux platform which is updated with full system images (i.e. the rootfs is erased and reflashed at every update). I'm backing up the content of the
/var/lib/zerotier-one/
folder to keep my device's configuration, which seems to be working. After an updatezerotier-cli status
tells me I'm still online, and I can see my device on the web interface as being connected with an IP. The problem is, on the device zerotier seems to be stuck on waiting for the configuration for the network, and my IP address is not set on the device for the zerotier interface.Before the update zerotier was working properly and the device had an IP address. No network changes before and after the update. The device is enabled on the web interface. I have tried rebooting with no success.
Is there any other configuration file I should be saving to be able to reconnect after an update ? Am I missing something ?
Thanks