Open arobinsongit opened 1 month ago
@arobinsongit, thanks for sharing the logs. Unfortunately, it doesn't show anything around 8:30; the only logs seem to be related to a system reboot:
2024-09-13T10:44:35-04:00 INFO client/cmd/root.go:191: shutdown signal received
2024-09-13T10:44:35-04:00 INFO client/internal/engine.go:252: Network monitor: stopped
...
2024-09-13T10:44:36-04:00 INFO client/internal/routemanager/manager.go:170: Routing cleanup complete
2024-09-13T10:44:37-04:00 INFO client/internal/engine.go:275: stopped Netbird Engine
2024-09-13T10:44:37-04:00 INFO client/internal/connect.go:281: stopped NetBird client
2024-09-13T10:44:42-04:00 INFO client/cmd/service_controller.go:80: stopped Netbird service
2024-09-13T13:25:20-04:00 INFO client/cmd/service_controller.go:24: starting Netbird service
2024-09-13T13:25:20-04:00 INFO client/cmd/service_controller.go:66: started daemon server: 127.0.0.1:41731
2024-09-13T13:25:20-04:00 INFO client/internal/connect.go:117: starting NetBird client version 0.29.2 on windows/amd64
The issue might be related to the daemon stopping. Can you please check the system logs in the event viewer for events related to the NetBird process?
@arobinsongit we've released the version 0.29.3, you can also upgrade your client to this version to validate if the issue was related to network changes that got fixed in this release.
I've upgraded to 0.29.3
I'll write a small script to dump the status every minute along with pinging a few other peers so if it goes down I can isolate the issue better. Our 0830 time before was just when we found the issue when a user tried to connect.
Regards Andy
On Tue, Sep 17, 2024 at 7:42 AM Maycon Santos @.***> wrote:
@arobinsongit https://github.com/arobinsongit we've released the version 0.29.3, you can also upgrade your client to this version to validate if the issue was related to network changes that got fixed in this release.
— Reply to this email directly, view it on GitHub https://github.com/netbirdio/netbird/issues/2608#issuecomment-2355460987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHLJIX36W664T7GI7YVP43ZXAIRXAVCNFSM6AAAAABOJJELZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGQ3DAOJYG4 . You are receiving this because you were mentioned.Message ID: @.***>
Ok, I imagined that the time wasn't exactly when the issue happened, but looking back in the logs, other events also don't display an abnormal event that could explain that.
Ok thanks - another question - is there a reliable address (IP or hostname) that I could ping on the netbird side that would confirm netbird connectivity? I can use another one of my peers but that might go up and down. Also if it reboots I don't know what the leases on the addresses look like so it might not come back up at the same address.
-andy
On Tue, Sep 17, 2024 at 8:00 AM Maycon Santos @.***> wrote:
Ok, I imagined that the time wasn't exactly when the issue happened, but looking back in the logs, other events also don't display an abnormal event that could explain that.
— Reply to this email directly, view it on GitHub https://github.com/netbirdio/netbird/issues/2608#issuecomment-2355512275, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHLJIWNKOXSK2FPLO7TQRLZXAKVHAVCNFSM6AAAAABOJJELZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGUYTEMRXGU . You are receiving this because you were mentioned.Message ID: @.***>
Looks like the service is not starting up successfully on reboots. The service is set to start automatically but it's timing out. I can start the service after I have logged on and it runs with no issues.
Debug file netbird.debug.1969967324.zip
Event Logs and Service configuration services-events.zip
Multiple reboots today with the last one being around 20:27
I do see this line after I startup the service
74235 Sep 17 20:25 Error Microsoft-Windows... 1023 Name resolution policy table has been corrupted. DNS resolution will fail until it is fixed. Contact your network administrator. For more information: read policy table for rule NetBird-Match failed...
Although that might not have anything to do with the service not starting on reboots
@arobinsongit I got this from one of the event logs:
74077 Sep 17 16:31 Error Service Control M... 3221232472 The NetBird service failed to start due to the following error: ...
can you share more details about it?
Dang, didn't realize powershell truncated that - that's kinda worthless :-)
Here are two messages back to back
Log Name: System Source: Service Control Manager Date: 9/17/2024 4:31:06 PM Event ID: 7009 Task Category: None Level: Error Keywords: Classic User: N/A Computer: TEST-001 Description: A timeout was reached (30000 milliseconds) while waiting for the NetBird service to connect.
Log Name: System Source: Service Control Manager Date: 9/17/2024 4:31:06 PM Event ID: 7000 Task Category: None Level: Error Keywords: Classic User: N/A Computer: TEST-001 Description: The NetBird service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
Question, do you know if you specifically test on Server 2019 or just a standard current Windows box like 10 or 11?
What's curious is I have another 2019 box that is exhibiting what I would categorize as similar, but maybe not the same symptoms. This one also has trouble after reboots. For this one what I tried was to to through
Service Stop Service Uninstall Service Install Service Start
and it seemed to get things working again.
I haven't seen similar symptoms on my Windows 10 boxes.
Describe the problem
We have a single machine in our group of peers that seems to randomly disconnect and will not reconnect until we uninstall and reinstall netbird.
Last time we saw the issue was September 13 at around 830 AM when a user tried to connect.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The machine should always be connected to netbird
Are you using NetBird Cloud?
Using netbird cloud
NetBird version
0.29.2
NetBird status -dA output:
-- Replaced hostnames with 001, 002, 003, etc. except for the host in question, test-001
Peers detail: andy-x1-01.netbird.cloud: NetBird IP: 100.83.216.253/32 Public key: xwHuTn+vUNJWuxHeUNYIVMwEHCNMie38MNRrwthK9FQ= Status: Disconnected -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: 5 minutes, 10 seconds ago Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Routes: - Latency: 0s
001.netbird.cloud: NetBird IP: 100.83.134.124/32 Public key: zVxs5wF1zCHh7OoaW11vDUo6HCKdZAgBvGwHoUbxKWQ= Status: Disconnected -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: 5 minutes, 10 seconds ago Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Routes: - Latency: 0s
002.netbird.cloud: NetBird IP: 100.83.24.87 Public key: S/wflOzdL/HmBPMDO7nlSrzT2cIxG02L5Ez/UxxWnhg= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): 10.20.30.102:51820/198.51.100.0:51820 Relay server address: Last connection update: 5 minutes, 9 seconds ago Last WireGuard handshake: 59 seconds ago Transfer status (received/sent) 276 B/924 B Quantum resistance: false Routes: 10.0.0.0/24 Latency: 27.3266ms
003.netbird.cloud: NetBird IP: 100.83.24.199 Public key: p+ou9OsQHDZvsHAM3P5yMrnYnD4svnhXMHJY9lTLgF8= Status: Disconnected -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: - Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Routes: - Latency: 0s
004.netbird.cloud: NetBird IP: 100.83.89.86 Public key: kspYz8Y+g6e9XhjrUO0tiulQ9KR2te2g651FUnsasGI= Status: Disconnected -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: - Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Routes: - Latency: 0s
005.netbird.cloud: NetBird IP: 100.83.137.194 Public key: AhXLX4DchwYz/dZ11JFRyHZ9lwD5ywy7D2j28ZS7r10= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): 127.0.0.1:51820/198.51.100.1:51820 Relay server address: Last connection update: 5 minutes, 8 seconds ago Last WireGuard handshake: 1 minute, 7 seconds ago Transfer status (received/sent) 409.3 KiB/1.8 MiB Quantum resistance: false Routes: - Latency: 32.6807ms
006.netbird.cloud: NetBird IP: 100.83.253.195 Public key: aiGNqgUZXnlM7DrmCnMKXVJutgtY/3MwDcILbdyzoxA= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): host/host ICE candidate endpoints (Local/Remote): 192.168.10.176:51820/192.168.10.164:51820 Relay server address: Last connection update: 5 minutes, 9 seconds ago Last WireGuard handshake: 59 seconds ago Transfer status (received/sent) 308 B/924 B Quantum resistance: false Routes: - Latency: 1.0118ms
OS: windows/amd64 Daemon version: 0.29.2 CLI version: 0.29.2 Management: Connected to https://api.netbird.io:443 Signal: Connected to https://signal.netbird.io:443 Relays: [stun:stun.netbird.io:5555] is Available [turns:turn.netbird.io:443?transport=tcp] is Available Nameservers: FQDN: test-001.netbird.cloud NetBird IP: 100.83.221.102/16 Interface type: Userspace Quantum resistance: false Routes: - Peers count: 3/7 Connected
Do you face any (non-mobile) client issues? no
Please provide the file created by
netbird debug for 1m -AS
.Screenshots
Additional context We are running both netbird and zerotier on these machines. We previously ran zerotier but are swapping over to netbird. We don't have this issue on any other machine so we suspect it is something unique to conditions on this single machine.