zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.51k stars 1.69k forks source link

Connection problem #1349

Closed cocoflan closed 3 years ago

cocoflan commented 3 years ago

No more connection possible between nodes, ios, android, linux, windows.

Suddenly no longer possible connections to al kind of hardware, and in the settings on page everything is active.

ahnchive commented 3 years ago

I'm having the same connection problem and upgrading to 1.6.3 didn't solve the issue.

JocPelletier commented 3 years ago

Same here, trying to connect to one of my node, it's ONLINE but can't ping or access using ssh. I also tried to add my moon and I dont see it in peers list

3a46f1bf30 - PLANET -1 RELAY 62f865ae71 - PLANET -1 RELAY 778cde7190 - PLANET -1 RELAY 86102XXXXX 1.4.6 LEAF -1 RELAY 93afaXXXXX 1.6.3 LEAF 348 DIRECT 8638 8269 34.94.1XX.X/XXXXX 992fcf1db7 - PLANET -1 RELAY
laduke commented 3 years ago

If anyone can reproduce this reliably, let us know.

@JocPelletier you might get up and running again by

here's some info about starting/stopping zerotier

msimunovic83 commented 3 years ago

I came across to the same issue yesterday and the issue is still persisting on my side. Suddenly I was not able to connect to my machines/nodes. Interesting thing is that 2 machines are still working without any problems. I also tried to upgrade to version 1.6.3 and this didn't resolve my problem. @laduke I'm able to reproduce this pretty consistently on my Windows and Linux machines. Interesting thing is that this started on build 1.4.6 (all machines had that version) and now they are on 1.6.3. Also tried to delete peer.d folder as you suggested, and restart zerotier. This also didn't solve the problem. I thought in the beginning that this is related to my local network and then I restarted the router and all the VMs that are connected to zerotier, but no success. I hope this is going to be resolved soon as this is quite a big issue. If you need some logs or help to troubleshoot this please contact me, on my side, this is a pretty consistent issue. I also tried to ping those machines that are no longer working and I noticed that at the beginning 2-4 packets passed with quite big latency >500ms and after that nothing passed. The host was unreachable and also in one moment, I got a message that the host is probably not alive as the route to the host is unknown.

laduke commented 3 years ago

@msimunovic83 how have you resolved it in the past?

markovs83 commented 3 years ago

This issue never happened to me until a few days ago. This is already 5 day that I'm trying to resolve the issue but no success. It just not responding and not sure how to proceed. If this persists it seems I'll uninstall everything and try to find an alternative solution.

laduke commented 3 years ago

that's frustrating. you've probably tried, but see if switching a couple nodes to version 1.4.6 helps.

laduke commented 3 years ago

Check out new version 1.6.4 https://www.zerotier.com/download/

apt update; apt install zerotier-one

camillescott commented 3 years ago

I'm having similar issues with my network. Four systems:

pegasus, valkyrie, and raider can all talk to each other via zerotier interfaces without issue. None of them can talk to galactica: it's always "no route to host" for traceroute, ping, or ssh. From galactica to the others, pings and tracees appear to work at first glance, but there's something odd: the time is always on the order of something like 0.01ms, whereas normal pings within the local network between other devices' zerotier interfaces are usually ~5ms. Seems the pings from galactica aren't actually leaving the machine. Lending credence to that, traceroutes and pings still report being successful after galactica is disabled in the zerotier web interface, with the same minuscule latency, even while zerotier-cli listnetworks reports ACCESS_DENIED PRIVATE.

galactica otherwise reports good via zerotier-cli status and zerotier-cli peers, with nothing showing up in any logs, and the systemctl status and journalctl showing nothing amiss. I've tried with zerotier versios 1.4.6, 1.6.4, and with a compiled version, all with the same failures. No ufw running, no iptables rules, nothing strange in /etc/hosts, route reports normal, and I can see zerotier listening on the rights ports via netstat. I'm at a loss.

Maybe zerotier doesn't like the lowlatency kernel? The only major changes I've made recently are that along with a new graphics card that requires the amdgpu driver. I can't really see how the latter would be related, but maybe something with the way the former does polling or interrupts doesn't play well. Unfortunately, I haven't tried to use ssh over zerotier to this machine for a few months because of pandemic, I only realized when wanting to sit outside on a sunny day and work, so I don't know exactly when this machine stopped behaving.

camillescott commented 3 years ago

Some more info, keeping in mind that I am in no way whatsoever a network engineer. I found some instructions to do packet tracing through the kernel, and traced the calls for pinging on the zerotier interface on galactica. I get:

sudo perf trace --no-syscalls --event "net:*" ping -c2 10.241.147.135  > /dev/null

     0.000 ping/93379 net:net_dev_queue(skbaddr: 0xffff95b46aba8000, len: 98, name: "lo")
     0.016 ping/93379 net:net_dev_start_xmit(name: "lo", skbaddr: 0xffff95b46aba8000, protocol: 2048, len: 98, network_offset: 14, transport_offset_valid: 1, transport_offset: 34)
     0.023 ping/93379 net:netif_rx_entry(name: "lo", napi_id: 2, skbaddr: 0xffff95b46aba8000, protocol: 2048, len: 84, truesize: 768, mac_header_valid: 1, mac_header: 4294967282)
     0.029 ping/93379 net:netif_rx(skbaddr: 0xffff95b46aba8000, len: 84, name: "lo")
     0.034 ping/93379 net:netif_rx_exit(skbaddr: 0xffff95b46aba8000, len: 84, name: "lo")
     0.040 ping/93379 net:net_dev_xmit(skbaddr: 0xffff95b46aba8000, len: 98, name: "lo")
     0.046 ping/93379 net:netif_receive_skb(skbaddr: 0xffff95b46aba8000, len: 84, name: "lo")
     0.081 ping/93379 net:net_dev_queue(skbaddr: 0xffff95b46aba9c00, len: 98, name: "lo")
     0.086 ping/93379 net:net_dev_start_xmit(name: "lo", skbaddr: 0xffff95b46aba9c00, protocol: 2048, len: 98, network_offset: 14, transport_offset_valid: 1, transport_offset: 34)
     0.089 ping/93379 net:netif_rx_entry(name: "lo", napi_id: 2, skbaddr: 0xffff95b46aba9c00, protocol: 2048, len: 84, truesize: 768, mac_header_valid: 1, mac_header: 4294967282)
     0.093 ping/93379 net:netif_rx(skbaddr: 0xffff95b46aba9c00, len: 84, name: "lo")
     0.096 ping/93379 net:netif_rx_exit(skbaddr: 0xffff95b46aba9c00, len: 84, name: "lo")
     0.098 ping/93379 net:net_dev_xmit(skbaddr: 0xffff95b46aba9c00, len: 98, name: "lo")
     0.102 ping/93379 net:netif_receive_skb(skbaddr: 0xffff95b46aba9c00, len: 84, name: "lo")

My understanding is that this should pass through ztly5stjmg (at least on my system) at some point, but it always stays in l0. If a run this on a working system (pegasus):

sudo perf trace --no-syscalls --event 'net:*' ping -c2 10.241.147.135 > /dev/null

     0.000 ping/277037 net:net_dev_queue:dev=ztly5stjmg skbaddr=0xffff9e4cbdfc7700 len=98
     0.025 ping/277037 net:net_dev_start_xmit:dev=ztly5stjmg queue_mapping=0 skbaddr=0xffff9e4cbdfc7700 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=0 len=98 data_len=0 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0
     0.046 ping/277037 net:net_dev_xmit:dev=ztly5stjmg skbaddr=0xffff9e4cbdfc7700 len=98 rc=0
  1001.601 ping/277037 net:net_dev_queue:dev=ztly5stjmg skbaddr=0xffff9e4d1c7ca600 len=98
  1001.635 ping/277037 net:net_dev_start_xmit:dev=ztly5stjmg queue_mapping=0 skbaddr=0xffff9e4d1c7ca600 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=0 len=98 data_len=0 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0
  1001.663 ping/277037 net:net_dev_xmit:dev=ztly5stjmg skbaddr=0xffff9e4d1c7ca600 len=98 rc=0

The packet passes through the zerotier interface.

Ifiht commented 3 years ago

I am also seeing similar issues to @camillescott, running Ubuntu 20.04. I have some networking background and would be happy to help troubleshoot this in any way possible. Here's some additional debugging info:

ZEROTIER VERSION

ZeroTier One version 1.6.4 Copyright (c) 2020 ZeroTier, Inc. Licensed under the ZeroTier BSL 1.1 (see LICENSE.txt)

ZEROTIER SERVICE

● zerotier-one.service - ZeroTier One
     Loaded: loaded (/lib/systemd/system/zerotier-one.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2021-03-01 01:45:25 UTC; 2min 8s ago
   Main PID: 1049215 (zerotier-one)
      Tasks: 7 (limit: 38191)
     Memory: 7.3M
     CGroup: /system.slice/zerotier-one.service
             └─1049215 /usr/sbin/zerotier-one

Mar 01 01:45:25 mimir systemd[1]: Started ZeroTier One.

IFCONFIG

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:2fff:feee:c2a9  prefixlen 64  scopeid 0x20<link>
        ether 02:42:2f:ee:c2:a9  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2503  bytes 113899 (113.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s25: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.1.2  netmask 255.255.255.0  broadcast 10.0.1.255
        inet6 fe80::8a88:88ff:fe88:8788  prefixlen 64  scopeid 0x20<link>
        ether 88:88:88:88:87:88  txqueuelen 1000  (Ethernet)
        RX packets 52265  bytes 36126355 (36.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 41754  bytes 6255918 (6.2 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20  memory 0xfb500000-fb520000  

enp4s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 1c:1b:0d:b0:00:8c  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfb400000-fb47ffff  

enp5s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 1c:1b:0d:b0:00:8d  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfb300000-fb37ffff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 4816  bytes 695310 (695.3 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4816  bytes 695310 (695.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth11ce324: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::28c7:84ff:fe5a:fb2  prefixlen 64  scopeid 0x20<link>
        ether 2a:c7:84:5a:0f:b2  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2564  bytes 121500 (121.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ztbtovvtci: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2800
        inet 10.11.12.2  netmask 255.255.255.0  broadcast 10.11.12.255
        inet6 fe80::a0e2:c3ff:fe42:ed9e  prefixlen 64  scopeid 0x20<link>
        inet6 fd12:ac4a:1e71:10d2:3999:93ae:aef5:2c73  prefixlen 88  scopeid 0x0<global>
        ether 3a:7c:be:84:32:39  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 45  bytes 6352 (6.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

PING

PING 10.11.12.13 (10.11.12.13) 56(84) bytes of data.
From 10.11.12.2 icmp_seq=1 Destination Host Unreachable
From 10.11.12.2 icmp_seq=2 Destination Host Unreachable
From 10.11.12.2 icmp_seq=3 Destination Host Unreachable
From 10.11.12.2 icmp_seq=4 Destination Host Unreachable
From 10.11.12.2 icmp_seq=5 Destination Host Unreachable
From 10.11.12.2 icmp_seq=6 Destination Host Unreachable
^C
--- 10.11.12.13 ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 7147ms
pipe 4

PING 10.11.12.2 (10.11.12.2) 56(84) bytes of data.
64 bytes from 10.11.12.2: icmp_seq=1 ttl=64 time=0.063 ms
64 bytes from 10.11.12.2: icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from 10.11.12.2: icmp_seq=3 ttl=64 time=0.053 ms
^C
--- 10.11.12.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2036ms
rtt min/avg/max/mdev = 0.053/0.058/0.063/0.004 ms

I was able to get an address assigned from my network at zerotier-central, but it took over 2 hours to join, and as the above output shows the connection is unusable for real traffic:

200 info aeaef52c73 1.6.4 TUNNELED

EDIT: Sorry I don't have a working linux box to compare to, my only working zerotier systems are Windows right now. But here is my result for camillescott's troubleshooting:

     0.000 ping/3859118 net:net_dev_queue:dev=ztbtovvtci skbaddr=0xffff942df5c90e00 len=42
     0.021 ping/3859118 net:net_dev_start_xmit:dev=ztbtovvtci queue_mapping=0 skbaddr=0xffff942df5c90e00 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0806 ip_summed=0 len=42 data_len=0 network_offset=14 transport_offset_valid=0 transport_offset=65533 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0
     0.039 ping/3859118 net:net_dev_xmit:dev=ztbtovvtci skbaddr=0xffff942df5c90e00 len=42 rc=0
camillescott commented 3 years ago

I followed up and tested on the following kernels:

And they all worked. Then, I booted back in to my usual 5.8.0-44-lowlatency aaaand... it was working again. So, my lowlatency theory didn't pan out. I'm still having intermittent connection issues trying to ssh from a remote system into galactica over the zerotier interface though.

joseph-henry commented 3 years ago

I've created a new experimental branch with some reverted path logic:

experimental-path-fix

If anyone is still having issues, delete your peers.d directory and try using this new branch. It would be helpful if you could build with ZT_DEBUG=1 (will output possibly useful debug traces) and try each of the latest two commits one by one and give feedback on what works.

We appreciate your help and patience.

joseph-henry commented 3 years ago

Additionally,

Here is another smaller fix to try on our dev branch: https://github.com/zerotier/ZeroTierOne/commit/353905394e79c54a4f479d722b74de8ea06a38b7

This fixes a potential interface binding issue that might be relevant. Please let us know if this helps.

Ifiht commented 3 years ago

So a bit more debugging, I'm hoping I can help narrow this down I joined another zerotier network, with a fresh install of ubuntu and on a network where my windows clients are still working. This was the state it's stuck in: 200 listnetworks 93afae5963e649a3 a2:e7:48:96:75:dd REQUESTING_CONFIGURATION PRIVATE ztzlgeqega - after switching my repo HEAD to your commit @joseph-henry , I then stopped the service, did a make, make install, and deleted the peers.d directory under /var/lib/zerotier-one. I started the service again and noticed the status had instantly changed to: 200 listnetworks 93afae5963e649a3 a2:e7:48:96:75:dd ACCESS_DENIED PRIVATE ztzlgeqega - but after authorizing the machine on zerotier central I still cannot ping or ssh to it, although it did receive the address I manually assigned. (ping and ssh work fine to the local ip address, and all other zerotier machines except the ones on Ubuntu 20.04 - my 18.04 is fine) Any other ideas, or things I could check?

adamierymenko commented 3 years ago

Is this still happening in 1.6.5?

Ifiht commented 3 years ago

I will test tonight with 1.6.5


From: Adam Ierymenko @.> Sent: Thursday, April 22, 2021 11:10:07 AM To: zerotier/ZeroTierOne @.> Cc: Michael @.>; Comment @.> Subject: Re: [zerotier/ZeroTierOne] Connection problem (#1349)

Is this still happening in 1.6.5?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/zerotier/ZeroTierOne/issues/1349#issuecomment-824926737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQSLL5ULVEE6QBJWQXJVBTTKA347ANCNFSM4XGVDOOA.

Ifiht commented 3 years ago

I tested with 1.6.5, and this time it is my Windows machine that isn't working so I'd like to make a few observations:

I've got the data below, from the Windows machine showing first that it is now off of the zerotier network, and second trying to manually run the 1.6.5 version and getting the same state error:

PS C:\> zerotier-cli.bat -v
1.6.4
PS C:\> ping 172.27.126.79

Pinging 172.27.126.79 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

and 1.6.5:

PS D:\ZeroTierOne-1.6.5\ZeroTierOne-1.6.5\windows\Build\x64\Debug> .\zerotier-one_x64.exe -C
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3
93afae5963e649a3 DROP frame a2:6b:48:28:9c:6a -> 01:80:c2:00:00:0e etherType 88cc size 44 (filter blocked)
93afae5963e649a3 DROP frame a2:6b:48:28:9c:6a -> 01:80:c2:00:00:0e etherType 88cc size 44 (filter blocked)
93afae5963e649a3 DROP frame a2:6b:48:28:9c:6a -> 01:80:c2:00:00:0e etherType 88cc size 44 (filter blocked)
93afae5963e649a3 DROP frame a2:6b:48:28:9c:6a -> 01:80:c2:00:00:0e etherType 88cc size 44 (filter blocked)
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3
requesting configuration for network 93afae5963e649a3

Could this be a NAT issue? I am running 4 ZT machines on a LAN all behind the same ISP connection.. *note - I moved all machines to a newly created ZT network, just to rule that out.

bartmichu commented 3 years ago

I believe I'm struggling with the same issue. Ubuntu 20.04 on amd64 is joined to 10-15 networks (it's a central monitoring system).

Ifiht commented 3 years ago

Sorry, really want to help nail this down I've had the ping running for hours now:

Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 172.27.126.79: bytes=32 time=12ms TTL=64
Reply from 172.27.126.79: bytes=32 time=13ms TTL=64
[...]
Reply from 172.27.126.79: bytes=32 time=11ms TTL=64
Reply from 172.27.126.79: bytes=32 time=14ms TTL=64
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 172.27.121.252: Destination host unreachable.
Request timed out.

the issue comes and goes...

running a simultaneous ping to www.google.com is good for the duration:

PS C:\> ping www.google.com -t

Pinging www.google.com [172.217.15.100] with 32 bytes of data:
Reply from 172.217.15.100: bytes=32 time=2ms TTL=119
Reply from 172.217.15.100: bytes=32 time=4ms TTL=119
Reply from 172.217.15.100: bytes=32 time=2ms TTL=119
[...]
Reply from 172.217.15.100: bytes=32 time=3ms TTL=119
Reply from 172.217.15.100: bytes=32 time=3ms TTL=119
Reply from 172.217.15.100: bytes=32 time=4ms TTL=119

meaning at the same time I can ping google continuously, the ZT network drops in and out.

Any other diagnostics I might be able to run to help narrow this down? Would a specific packet capture help? @bartmichu to your point, I do have one machine on 3 zerotier networks at once, maybe that's the common factor here?

laduke commented 3 years ago

Thanks for all the info.

@bartmichu is that monitoring node behind NAT?

Can people check dmesg for warnings from iptables, conntrack, etc in their problem nodes and/or routers in front of problem nodes if applicable.

Are you all still seeing it say RELAY for everything, or does the peers list look OK and things just don't talk?

bartmichu commented 3 years ago

@laduke It's a LXD container with no ports redirected so yes, behind a NAT. Public IP address is static. Some connections are direct and others are via relay. Problematic one uses relay, but it's not the only one that uses relay.

When it happens, this monitoring node can't connect to other nodes in one particular network and other nodes from that network can't connect to it (ICMP ping and TCP). At the same time everything looks OK in ZerioTier Central - status says ONLINE and it shows correct Physical IP Address.

glimberg commented 3 years ago

I'm pretty sure we've discovered the cause of the coma issue that @JocPelletier has where the PLANET/root servers are showing RELAYED in the zerotier-cli peers output. The culprit? Your routers.

What is happening is that your router likely has some "security" software on it that is classifying ZeroTier traffic as "malicious" and therefore blocking it. This explains why changing your Node ID resolves the issue as well. ZeroTier opens multiple ports on startup to help deal crappy NAT implementations. The secondary & tertiary ports are deterministically chosen based on the Node ID. Once a new port is being used, the traffic starts flowing again.

I've pushed a new change to the dev branch (https://github.com/zerotier/ZeroTierOne/commit/4fed56443e47ae82b60b347d5614b8a122224320) that makes the secondary & tertiary ports completely randomized on startup and no longer deterministic based on the node ID. We're also investigating ways to detect when all the PLANET/roots go to the RELAYED state and re-randomize the ports automatically so that it won't require a restart. No ETA on that 2nd patch, though.

Immediate workarounds:

  1. Download & build the dev branch.
  2. You can alter the ports ZeroTier uses via local.conf

The conf file can be found (if you've already created one) in one of these places depending on your OS:

If the file doesn't exist, simply create it. In there you can set one or all of the following options:

Caveats:

example local.conf file setting only the "secondaryPort" option:

{
  "settings": {
    "secondaryPort": 21234
  }
}
Ifiht commented 3 years ago

@glimberg cannot confirm the issue was a router one, but I've been monitoring for recurrences consistently on 1.6.5 with no success... whatever was happening on my network is no longer going on, so hopefully not my machines or ZT software.

jmporchet commented 3 years ago

I have a box in the Oracle cloud that serves as the "visible" part of my infrastructure (running ZT and Caddy).

In my home network (behind CG-NAT) there's a mac mini running an ubuntu 20 VM (x.x.x.124) and an unraid server (x.x.x.100). I'd like to migrate my Filerun docker service from the Ubuntu machine to the unraid one.

On Ubuntu the ZT connection has been working flawlessly for weeks, but when I changed the reverse proxy IP in Caddy to point to the unraid server, it worked for a couple of hours then stopped. My unraid server's ZT client has been cloned and built from master, I'm not using the ZT docker image provided by the unraid community apps.

Referring to this thread I compiled the dev branch and created a local.conf as instructed. This makes it work again for a while, but the morning after when I wake up it doesn't anymore. When I change the port to something else, it works for some more time then stops again.

On the Oracle box (44f1970506):

$ sudo zerotier-cli listpeers
200 listpeers <ztaddr> <path> <latency> <version> <role>
200 listpeers 0cccb752f7 35.239.208.158/64393;3424;11871 -1 1.6.4 LEAF
200 listpeers 435fd450d9 - -1 - LEAF
200 listpeers 516d9b706f - -1 1.6.5 LEAF
200 listpeers 61d294b9cb 50.7.73.34/9993;3424;3266 158 - PLANET
200 listpeers 62f865ae71 50.7.252.138/9993;3424;3272 152 - PLANET
200 listpeers 6403311540 - -1 - LEAF
200 listpeers 6bd9bafba8 - -1 - LEAF
200 listpeers 778cde7190 103.195.103.66/9993;23613;3313 111 - PLANET
200 listpeers 992fcf1db7 195.181.173.159/9993;3425;3410 15 - PLANET
$ curl -IL x.x.x.100:82
curl: (7) Failed to connect to x.x.x.100 port 82: No route to host

On the unraid server (6bd9bafba8):

# sudo zerotier-cli listpeers
200 listpeers <ztaddr> <path> <latency> <version> <role>
200 listpeers 090ec0d400 - -1 1.6.5 LEAF
200 listpeers 0cccb752f7 35.239.208.158/64392;90;10608 150 1.6.4 LEAF
200 listpeers 435fd450d9 192.168.1.55/27351;1457;1335 4 1.6.5 LEAF
200 listpeers 44f1970506 - -1 - LEAF
200 listpeers 4f8665aaec - -1 - LEAF
200 listpeers 516d9b706f 192.168.1.31/40909;135;132 3 1.6.5 LEAF
200 listpeers 61d294b9cb 50.7.73.34/9993;10755;424 230 - PLANET
200 listpeers 62f865ae71 50.7.252.138/9993;652;335 310 - PLANET
200 listpeers 6403311540 192.168.1.64/60354;1457;1336 9 1.6.5 LEAF
200 listpeers 778cde7190 103.195.103.66/9993;652;501 151 - PLANET
200 listpeers 992fcf1db7 195.181.173.159/9993;90;604 47 - PLANET
200 listpeers bfde2fc121 192.168.1.77/58586;1457;12886 -1 1.6.5 LEAF

I cannot ssh form either side to the other using the ZT IPs, but I can ssh from the unraid server to the Oracle box via its public IP address.

Hopefully this gives enough details?

Edit: For those having the same problem, I just switched to Tailscale and with their docker image that took even less time than configuring ZeroTier. Too bad!

nebojsa-simic commented 3 years ago

Same problem.

I have two hosts:

Both hosts are displayed in the node list in ZeroTier Central. Both are shown as ONLINE. Both are running 1.6.5

No VPN connection between the two can be established (ping, curl, traceroute, nothing works). Worked just fine couple of days ago (I guess on an older version of ZT)

Tried debugging - moved hosts to another subnet, refreshed IPs, deleted peers.d, nothing could network connectivity between them.

Temporarily installed Tailscale as suggested above - that worked nicely

ddeitterick commented 3 years ago

I'm seeing the same issue as @bartmichu. I posted some additional information here: https://discuss.zerotier.com/t/zerotier-client-stops-routing-to-one-network/3917

joseph-henry commented 3 years ago

It seems that the original issue for this ticket was resolved. I'm closing this because there isn't enough evidence that the issues added later are related. If you are having connectivity issues it is best to create your own ticket and link to ones you suspect are related that way we can isolate problems more effectively. Thanks.