gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.52k stars 552 forks source link

[Bug]: netclient unable connect to mq server[Unable to connect (A TLS error occurred.)] #1100

Closed raojinlin closed 2 years ago

raojinlin commented 2 years ago

Contact Details

1239015423@qq.com

What happened?

netclient was unable to connect to the MQ server because of a certificate problem。

I have three clients, two Linux and one Mac OS. None of them can connect to the MQ server.

When I use the OpenSSL command to verify the certificate, one Linux can pass the verification, but the other Linux and MacOS fail. The following is the version and verification output of OpenSSL.

version: Linux1: OpenSSL 1.1.1f 31 Mar 2020 Linux2: OpenSSL 1.0.2k-fips 26 Jan 2017 MacOS: LibreSSL 2.8.3

Linux1:

root@ubuntu:/etc/netclient/netmaker-api.xxx.com# openssl verify -CAfileroot.pem client.pem
client.pem: OK

root@ubuntu:/etc/netclient/netmaker-api.xxx.com# mosquitto_pub -h netmaker-api.xxx.com -p 8883 -t hello/test -m 'dddxxx' --cert client.pem --cafile root.pem --key /etc/netclient/client.key -d
Client mosq-bI0YtDTTcIrbLGxeXV sending CONNECT
Client mosq-bI0YtDTTcIrbLGxeXV received CONNACK (0)
Client mosq-bI0YtDTTcIrbLGxeXV sending PUBLISH (d0, q0, r0, m1, 'hello/test', ... (6 bytes))
Client mosq-bI0YtDTTcIrbLGxeXV sending DISCONNECT

Linux2:

[root@localhost netmaker-api.xxx.com]# openssl verify -CAfile root.pem client.pem
client.pem: C = US, O = Gravitl, CN = CA Root
error 6 at 0 depth lookup:unable to decode issuer public key
140375689643920:error:0609E09C:digital envelope routines:PKEY_SET_TYPE:unsupported algorithm:p_lib.c:239:
140375689643920:error:0B07706F:x509 certificate routines:X509_PUBKEY_get:unsupported algorithm:x_pubkey.c:148:
140375689643920:error:0609E09C:digital envelope routines:PKEY_SET_TYPE:unsupported algorithm:p_lib.c:239:
140375689643920:error:0B07706F:x509 certificate routines:X509_PUBKEY_get:unsupported algorithm:x_pubkey.c:148:
140375689643920:error:0B06E06C:x509 certificate routines:X509_get_pubkey_parameters:unable to get certs public key:x509_vfy.c:2098:

[root@localhost netmaker-api.xxx.com]# mosquitto_pub -h netmaker-api.xxx.com -p 8883 -t hello/test -m 'dddxxx' --cert /etc/netclient/netmaker-api.xxx.com/client.pem --cafile /etc/netclient/netmaker-api.xxx.com/root.pem --key /etc/netclient/client.key  -d
Error: Unable to load client certificate "/etc/netclient/netmaker-api.xxx.com/client.pem".
OpenSSL Error[0]: error:0609E09C:digital envelope routines:PKEY_SET_TYPE:unsupported algorithm
OpenSSL Error[1]: error:0B07706F:x509 certificate routines:X509_PUBKEY_get:unsupported algorithm
OpenSSL Error[2]: error:140BF10C:SSL routines:SSL_SET_CERT:x509 lib
Unable to connect (A TLS error occurred.).

MacOS:

MacBook-Air:netmaker-api.xxx.com root# openssl verify -CAfile root.pem client.pem
client.pem: C = US, O = Gravitl, CN = CA Root
error 6 at 1 depth lookup:unable to decode issuer public key
8673539756:error:06FFF09C:digital envelope routines:CRYPTO_internal:unsupported algorithm:/AppleInternal/Library/BuildRoots/66382bca-8bca-11ec-aade-6613bcf0e2ee/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/crypto/evp/p_lib.c:245:
8673539756:error:0BFFF06F:x509 certificate routines:CRYPTO_internal:unsupported algorithm:/AppleInternal/Library/BuildRoots/66382bca-8bca-11ec-aade-6613bcf0e2ee/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/crypto/asn1/x_pubkey.c:197:

MacBook-Air:netmaker-api.xxx.com root# mosquitto_pub -h netmaker-api.xxx.
com -p 8883 --cafile ./root.pem --cert client.pem --key ../client.key -t hello/wt -m x -d
Client null sending CONNECT
Error: host name verification failed.
OpenSSL Error[0]: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed
Error: A TLS error occurred.

Although the openssl command of Linux 1 and mosquitto_pub command runs successfully, but when netclient is started, it is still unable to connect to the MQ server.

Please let me know how I can solve this problem. Thank you!

Version

v0.14.0

What OS are you using?

Linux, Mac

Relevant log output

[root@localhost ~]# netclient daemon -vvv
[netclient] 2022-05-17 23:42:05 initializing network default
[netclient] 2022-05-17 23:42:05 pulling latest config for  default
[netclient] 2022-05-17 23:42:08 waiting for interface...
[netclient] 2022-05-17 23:42:08 interface ready - netclient.. ENGAGE
[netclient] 2022-05-17 23:42:10 started daemon for server  netmaker-api.xxx.com
[netclient] 2022-05-17 23:42:10 netclient daemon started for server:  netmaker-api.xxx.com
[netclient] 2022-05-17 23:42:40 unable to connect to broker, retrying ...
[netclient] 2022-05-17 23:42:40 could not connect to broker netmaker-api.xxx.com connect timeout
[netclient] 2022-05-17 23:42:40 connection issue detected.. attempt connection with new certs
[netclient] 2022-05-17 23:42:40 register at https://netmaker-api.xxx.com/api/server/register
[netclient] 2022-05-17 23:42:40 certificates/key saved
[netclient] 2022-05-17 23:42:41 restarting netclient.service
[netclient] 2022-05-17 23:43:11 local port has changed from  0  to  51821

root@ubuntu:~# docker logs mq --tail 20 -f
1652845513: New connection from 127.0.0.0:21825 on port 8883.
1652845513: OpenSSL Error[0]: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate
1652845513: Client <unknown> disconnected: Protocol error.

Contributing guidelines

afeiszli commented 2 years ago

Hi @raojinlin, please paste output from the netclient.service logs on those machines, since they should be pulling new certificates automatically, but it sounds like they are not. Then, try running "netclient pull" on those clients, which should manually pull new certificates for those clients.

raojinlin commented 2 years ago

Hi @afeiszli , I tried to usenetclient pullto obtain a new certificates, and this command generates a new certificates, but there will still be the same error.

Before netclient pull: image

After netclient pull: image

The netclient.service logs:

May 18 11:14:03 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:03 interface ready - netclient.. ENGAGE
May 18 11:14:05 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:05 started daemon for server  netmaker-api.xxx.com
May 18 11:14:05 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:05 netclient daemon started for server:  netmaker-api.xxx.com
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 unable to connect to broker, retrying ...
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 could not connect to broker netmaker-api.xxx.com connect timeout
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 connection issue detected.. attempt connection with new certs
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 register at https://netmaker-api.xxx.com/api/server/register
May 18 11:14:35 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:35 certificates/key saved
May 18 11:14:36 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:36 restarting netclient.service
May 18 11:14:37 localhost.localdomain systemd[1]: Stopping Netclient Daemon...
May 18 11:14:37 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:37 shutting down netclient daemon
May 18 11:14:37 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:37 checkin routine closed
May 18 11:14:37 localhost.localdomain netclient[13121]: [netclient] 2022-05-18 11:14:37 shutdown complete
May 18 11:14:37 localhost.localdomain systemd[1]: Stopped Netclient Daemon.
May 18 11:14:37 localhost.localdomain systemd[1]: Started Netclient Daemon.
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 initializing network default
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 pulling latest config for  default
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 started daemon for server  netmaker-api.xxx.com
May 18 11:14:37 localhost.localdomain netclient[13276]: [netclient] 2022-05-18 11:14:37 netclient daemon started for server:  netmaker-api.xxx.com
afeiszli commented 2 years ago

Ah! The broker domain appears to be incorrect. The value of SERVER_NAME in your docker-compose should be the broker domain. In yours, it appears to be pointing to the api: netmaker-api.xxx.com. Please change this to the relevant domain such as netmaker-broker.xxx.com. You will then have to restart the server, create a new key (to get the correct details) and rejoin.

raojinlin commented 2 years ago

Is this certificate generation related to the domain name? In my environment both netmaker-api and borker are on the same server. Their public ip is the same.

I have tried changing the domain name to netmaker-broker.xxx.com, but I still have the same error after restarting docker-compose.

netclient.service logs:

May 18 23:51:22 10-25-241-158 systemd[1]: netclient.service: Succeeded.
May 18 23:51:22 10-25-241-158 systemd[1]: Stopped Netclient Daemon.
May 18 23:51:22 10-25-241-158 systemd[1]: Started Netclient Daemon.
May 18 23:51:22 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:22 initializing network default
May 18 23:51:22 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:22 started daemon for server  netmaker-broker.xxx.com
May 18 23:51:22 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:22 netclient daemon started for server:  netmaker-broker.xxx.com
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 unable to connect to broker, retrying ...
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 could not connect to broker netmaker-broker.xxx.com connect timeout
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 connection issue detected.. attempt connection with new certs
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 register at https://netmaker-api.xxx.com/api/server/register
May 18 23:51:52 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:52 certificates/key saved
May 18 23:51:53 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:53 restarting netclient.service
May 18 23:51:54 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:54 shutting down netclient daemon
May 18 23:51:54 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:54 checkin routine closed
May 18 23:51:54 10-25-241-158 netclient[965664]: [netclient] 2022-05-18 23:51:54 shutdown complete
May 18 23:51:54 10-25-241-158 systemd[1]: Stopping Netclient Daemon...
May 18 23:51:54 10-25-241-158 systemd[1]: netclient.service: Succeeded.
May 18 23:51:54 10-25-241-158 systemd[1]: Stopped Netclient Daemon.
May 18 23:51:54 10-25-241-158 systemd[1]: Started Netclient Daemon.
mattkasun commented 2 years ago

does netmaker-broker.xxx.com resolve? and is port 8883 publicly accessible?

raojinlin commented 2 years ago

Hi @mattkasun netmaker-broker.xxx.com can be resolved, I have configured it in /etc/hosts. 8883 port is publicly accessible.

simonxmau commented 2 years ago

@raojinlin I have the same problem. It happened at first, but then it became normal for no apparent reason. But the same problem appeared when I installed on another server.

You can try cleaning /etc/netclient and rejoining it.

mattkasun commented 2 years ago

Have you tried everything in the MQ troubleshooting: https://gist.github.com/mattkasun/face2a7c1f32031a2126ff7243caad12

afeiszli commented 2 years ago

@raojinlin @simon-mao have you checked outbound firewall to see if it is blocking 8883? We have a suspicion this is causing an issue for some users, would be good to check.

raojinlin commented 2 years ago

I just deployed a new environment. When debugging the netclient daemon, I found an error when the client shook hands with the server TLS: network error: network Error : x509: cannot validate certificate for 192.168.122.87 because it doesn't contain any IP SANs.

/github.com/eclipse/paho.mqtt.golang@v1.3.5/client.go image

This is my docker-compose.yaml configuration:

image

Then I checked the certificate of netmaker:

image

I checked the code for generating the certificate servercfg Getserver(), according to the logic of this function, it should get SERVER_NAME this environment variable. However, the subject CN of the certificate is not the environment variableSERVER_NAME, but IP address.

image

raojinlin commented 2 years ago

I just deployed a new environment. When debugging the netclient daemon, I found an error when the client shook hands with the server TLS: network error: network Error : x509: cannot validate certificate for 192.168.122.87 because it doesn't contain any IP SANs.

/github.com/eclipse/paho.mqtt.golang@v1.3.5/client.go image

This is my docker-compose.yaml configuration:

image

Then I checked the certificate of netmaker:

image

I checked the code for generating the certificate servercfg Getserver(), according to the logic of this function, it should get SERVER_NAME this environment variable. However, the subject CN of the certificate is not the environment variableSERVER_NAME, but IP address.

image

Sorry, I just checked this certificate. It was generated before. The previous s was 192.168.122.87.

afeiszli commented 2 years ago

Sorry, I just checked this certificate. It was generated before. The previous s was 192.168.122.87.

@raojinlin so does it currently have the correct CA or no?

raojinlin commented 2 years ago

Yes, it can now be successfully connected.

image

afeiszli commented 2 years ago

Okay, I want to confirm the steps it took to fix. Was it fixed by just restarting the server?

raojinlin commented 2 years ago

I deleted the /root/certs directory and restarted netmaker.

afeiszli commented 2 years ago

So these troubleshooting instructions are correct then: https://gist.github.com/mattkasun/face2a7c1f32031a2126ff7243caad12

Can we close this issue?

We will need to determine how this issue started and potentially put in something to auto-heal.

raojinlin commented 2 years ago

OK thanks

afeiszli commented 2 years ago

Closing. For those who find this issue, PLEASE follow the above Gist, in particular the part about deleting /root/certs