Closed raojinlin closed 2 years ago
Hi @raojinlin, please paste output from the netclient.service logs on those machines, since they should be pulling new certificates automatically, but it sounds like they are not. Then, try running "netclient pull" on those clients, which should manually pull new certificates for those clients.
Hi @afeiszli , I tried to usenetclient pull
to obtain a new certificates, and this command generates a new certificates, but there will still be the same error.
Before netclient pull
:
After netclient pull
:
The netclient.service logs:
May 18 11:14:03 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:03 interface ready - netclient.. ENGAGE
May 18 11:14:05 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:05 started daemon for server netmaker-api.xxx.com
May 18 11:14:05 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:05 netclient daemon started for server: netmaker-api.xxx.com
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 unable to connect to broker, retrying ...
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 could not connect to broker netmaker-api.xxx.com connect timeout
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 connection issue detected.. attempt connection with new certs
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 register at https://netmaker-api.xxx.com/api/server/register
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 certificates/key saved
May 18 11:14:36 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:36 restarting netclient.service
May 18 11:14:37 localhost.localdomain systemd[1]: Stopping Netclient Daemon...
May 18 11:14:37 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:37 shutting down netclient daemon
May 18 11:14:37 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:37 checkin routine closed
May 18 11:14:37 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:37 shutdown complete
May 18 11:14:37 localhost.localdomain systemd[1]: Stopped Netclient Daemon.
May 18 11:14:37 localhost.localdomain systemd[1]: Started Netclient Daemon.
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 initializing network default
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 pulling latest config for default
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 started daemon for server netmaker-api.xxx.com
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 netclient daemon started for server: netmaker-api.xxx.com
Ah! The broker domain appears to be incorrect. The value of SERVER_NAME in your docker-compose should be the broker domain. In yours, it appears to be pointing to the api: netmaker-api.xxx.com. Please change this to the relevant domain such as netmaker-broker.xxx.com. You will then have to restart the server, create a new key (to get the correct details) and rejoin.
Is this certificate generation related to the domain name? In my environment both netmaker-api and borker are on the same server. Their public ip is the same.
I have tried changing the domain name to netmaker-broker.xxx.com, but I still have the same error after restarting docker-compose.
netclient.service logs:
May 18 23:51:22 10-25-241-158 systemd[1]: netclient.service: Succeeded.
May 18 23:51:22 10-25-241-158 systemd[1]: Stopped Netclient Daemon.
May 18 23:51:22 10-25-241-158 systemd[1]: Started Netclient Daemon.
May 18 23:51:22 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:22 initializing network default
May 18 23:51:22 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:22 started daemon for server netmaker-broker.xxx.com
May 18 23:51:22 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:22 netclient daemon started for server: netmaker-broker.xxx.com
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 unable to connect to broker, retrying ...
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 could not connect to broker netmaker-broker.xxx.com connect timeout
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 connection issue detected.. attempt connection with new certs
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 register at https://netmaker-api.xxx.com/api/server/register
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 certificates/key saved
May 18 23:51:53 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:53 restarting netclient.service
May 18 23:51:54 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:54 shutting down netclient daemon
May 18 23:51:54 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:54 checkin routine closed
May 18 23:51:54 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:54 shutdown complete
May 18 23:51:54 10-25-241-158 systemd[1]: Stopping Netclient Daemon...
May 18 23:51:54 10-25-241-158 systemd[1]: netclient.service: Succeeded.
May 18 23:51:54 10-25-241-158 systemd[1]: Stopped Netclient Daemon.
May 18 23:51:54 10-25-241-158 systemd[1]: Started Netclient Daemon.
does netmaker-broker.xxx.com resolve? and is port 8883 publicly accessible?
Hi @mattkasun netmaker-broker.xxx.com can be resolved, I have configured it in /etc/hosts. 8883 port is publicly accessible.
@raojinlin I have the same problem. It happened at first, but then it became normal for no apparent reason. But the same problem appeared when I installed on another server.
You can try cleaning /etc/netclient and rejoining it.
Have you tried everything in the MQ troubleshooting: https://gist.github.com/mattkasun/face2a7c1f32031a2126ff7243caad12
@raojinlin @simon-mao have you checked outbound firewall to see if it is blocking 8883? We have a suspicion this is causing an issue for some users, would be good to check.
I just deployed a new environment. When debugging the netclient daemon
, I found an error when the client shook hands with the server TLS: network error: network Error : x509: cannot validate certificate for 192.168.122.87 because it doesn't contain any IP SANs
.
/github.com/eclipse/paho.mqtt.golang@v1.3.5/client.go
This is my docker-compose.yaml configuration:
Then I checked the certificate of netmaker:
I checked the code for generating the certificate servercfg Getserver()
, according to the logic of this function, it should get SERVER_NAME
this environment variable. However, the subject CN of the certificate is not the environment variableSERVER_NAME
, but IP address.
I just deployed a new environment. When debugging the
netclient daemon
, I found an error when the client shook hands with the server TLS: network error:network Error : x509: cannot validate certificate for 192.168.122.87 because it doesn't contain any IP SANs
./github.com/eclipse/paho.mqtt.golang@v1.3.5/client.go
This is my docker-compose.yaml configuration:
Then I checked the certificate of netmaker:
I checked the code for generating the certificate
servercfg Getserver()
, according to the logic of this function, it should getSERVER_NAME
this environment variable. However, the subject CN of the certificate is not the environment variableSERVER_NAME
, but IP address.
Sorry, I just checked this certificate. It was generated before. The previous s was 192.168.122.87.
Sorry, I just checked this certificate. It was generated before. The previous s was 192.168.122.87.
@raojinlin so does it currently have the correct CA or no?
Yes, it can now be successfully connected.
Okay, I want to confirm the steps it took to fix. Was it fixed by just restarting the server?
I deleted the /root/certs
directory and restarted netmaker.
So these troubleshooting instructions are correct then: https://gist.github.com/mattkasun/face2a7c1f32031a2126ff7243caad12
Can we close this issue?
We will need to determine how this issue started and potentially put in something to auto-heal.
OK thanks
Closing. For those who find this issue, PLEASE follow the above Gist, in particular the part about deleting /root/certs
Contact Details
1239015423@qq.com
What happened?
netclient was unable to connect to the MQ server because of a certificate problem。
I have three clients, two Linux and one Mac OS. None of them can connect to the MQ server.
When I use the OpenSSL command to verify the certificate, one Linux can pass the verification, but the other Linux and MacOS fail. The following is the version and verification output of OpenSSL.
version: Linux1: OpenSSL 1.1.1f 31 Mar 2020 Linux2: OpenSSL 1.0.2k-fips 26 Jan 2017 MacOS: LibreSSL 2.8.3
Linux1:
Linux2:
MacOS:
Although the openssl command of Linux 1 and mosquitto_pub command runs successfully, but when netclient is started, it is still unable to connect to the MQ server.
Please let me know how I can solve this problem. Thank you!
Version
v0.14.0
What OS are you using?
Linux, Mac
Relevant log output
Contributing guidelines