[zerotier] stucks in deploying, even if it is running successfully

fengyuan213 commented 3 months ago

Hi Team, I am on latest TrueNas Scale, it always shows it is deploying rather than running. It is already configured and connected to my zerotier network

stavros-k commented 3 months ago

Please share the logs of the application in order to help find the issue.

Thanks

lucas-walter commented 3 months ago

Hello, I've been watching this issue for a bit as I have the same problem and it seems like it makes the server disconnect from the ZeroTier network regularly when the pods restart by themselves, causing about 12% packet loss over time.

ZeroTier works outside of those restarts, the server is in the network and can be accessed via any port or service (SMB, SSH, HTTP/S, Apps tested).

What I've done to set it up:

Install the app through the Discover Apps process in the UI
generate an identity (Secret/Public pair) through zerotier-idtool generate
enter both in the App configuration along with the network ID
enabled Host Network (though the app still remains in a Deploying state without this checked)
Authorized the server in the ZeroTier Dashboard

I'll post everything related I can find in logs to hopefully help: In Related Kubernetes Events:

2024-08-11 08:01:48
Startup probe failed: unknown network ID, check that you are a member of the network

In the Docker Container's logs, having replaced my network ID with NETID and the server's ZeroTier address with ADDR:

2024-08-11 15:53:08.853858+02:00=> Configuring networks to join
2024-08-11 15:53:08.855240+02:00=> Joining networks from command line: [NETID]
2024-08-11 15:53:08.855495+02:00===> Configuring join: [NETID]
2024-08-11 15:53:08.856286+02:00=> Starting ZeroTier
2024-08-11 15:53:08.858137+02:00===> ZeroTier hasn't started, waiting a second
2024-08-11 15:53:08.882744+02:00Starting Control Plane...
2024-08-11 15:53:08.882777+02:00Starting V6 Control Plane...
2024-08-11 15:53:09.861243+02:00=> Writing healthcheck for networks: [NETID]
2024-08-11 15:53:09.875548+02:00=> zerotier-cli info: [200 info ADDR 1.14.0 OFFLINE]
2024-08-11 15:53:09.875698+02:00=> Sleeping infinitely

Output of kubectl get pods (in ix-zerotier):

root@truenas[/home/admin]# k3s kubectl get pods --all-namespaces
NAMESPACE     NAME                                      READY   STATUS      RESTARTS       AGE
ix-zerotier   zerotier-7b676cb79c-dwhmg                 0/1     Completed   12 (35h ago)   36h
ix-zerotier   zerotier-869bd58bd7-t6h4v                 0/1     Completed   110            21h
ix-zerotier   zerotier-869bd58bd7-jhhsz                 0/1     Running     92 (88s ago)   12h

lucas-walter commented 3 months ago

I've debugged this further and have managed to fix it.

ZeroTier's Network ID is generally not case-sensitive - Setup instructions use uppercase or lowercase letters interchangeably and their Dashboard shows you both depending on where you look and for first setups tends to give you uppercase IDs to paste.

It turns out the network ID parameter in zerotier-cli get <netid> <property> (which is used by the healthcheck.sh Script) is case-sensitive and will only work with lowercase network IDs. You can reproduce this in the docker container's shell like this:

# zerotier-cli get 8056c2e21c000001 status
OK
# zerotier-cli get 8056C2E21C000001 status
unknown network ID, check that you are a member of the network

Currently this can be fixed by just entering the network ID in lowercase.

stavros-k commented 3 months ago

I've debugged this further and have managed to fix it.

ZeroTier's Network ID is generally not case-sensitive - Setup instructions use uppercase or lowercase letters interchangeably and their Dashboard shows you both depending on where you look and for first setups tends to give you uppercase IDs to paste.

It turns out the network ID parameter in zerotier-cli get <netid> <property> (which is used by the healthcheck.sh Script) is case-sensitive and will only work with lowercase network IDs. You can reproduce this in the docker container's shell like this:
# zerotier-cli get 8056c2e21c000001 status
OK
# zerotier-cli get 8056C2E21C000001 status
unknown network ID, check that you are a member of the network
Currently this can be fixed by just entering the network ID in lowercase.

Amazing work there. Just opened a PR to make sure it will get lowercased always.

THANKS!

truenas / charts

[zerotier] stucks in deploying, even if it is running successfully #2708