Closed cambrosch closed 3 months ago
@cambrosch I think this comes from new SSL requirements in OTP 26
(which used for 2.0.0
while 1.3.0
is based on OTP 25
).
But it's a good catch. It seems in 26, SSL wants to explicitly know about the CA chain.
Can you try setting vmq_diversity.postgres.cafile
in vernemq.conf
?
Maybe pointing to the system CA certs is enough (/etc/ssl/certs/ca-certificates.crt
), maybe not...
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
That sadly doesn't work, I can't add it via DOCKER_VERNEMQ_VMQ_DIVERSITYPOSTGRESCAFILE as that throws an Error generating Config with cuttlefish, and I also can't manually override the config file in the docker container, I tried that in several configurations but if I change it manually, as soon as I restart vernemq it gets overridden, and if I mount a drive to save the config file, it wipes the docker container, and refuses to work for one reason or another. That's a separate issue, but probably not one I can quickly fix :/
I think you can mount a conf.local
file and when the Docker image finds this, it takes that conf file as a full replacement.
/etc/vernemq/vernemq.conf.local
.
But this will not solve the issue here. An error generating the config is usually a wrong setting name. But yours looks correct :(
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@cambrosch do you see the Cuttlefish config error printed to you console when you run the Docker image in the foreground?
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
2024-05-02T10:15:08.68071 Connecting to the container 'vernemq'...
2024-05-02T10:15:08.70573 Successfully Connected to container: 'vernemq' [Revision: 'vernemq--jamr5ej-5dfd4d78dc-4khc5', Replica: 'vernemq--jamr5ej']
2024-05-02T10:15:10.703798696Z Error generating config with cuttlefish
2024-05-02T10:15:10.703850738Z run `vernemq config generate -l debug` for more information.
@cambrosch are you able to attach to the container and run vernemq config generate -l debug
? This should print out the actual config problem.
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
Sadly, the container immediately crashes upon getting this message, so I cannot attach a console :/
I just tested this with a docker run
, feeding it an example.env file with Postgres configs similar to yours. This initially complained about whitespaces around the ='s in the env file, but other than that seems to work, at least no complaints generating the config. I'm not sure how you run the Docker image, though.
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
Ah, I messed that up. I had /etc/ssl mounted for the MQTT TLS certs, so /etc/ssl/certs/ca-certificates.crt didn't even exist. I re-created that now, and now the config at least boots again. Alas, now I'm on to a new error:
2024-05-02T14:49:19.698008699Z 2024-05-02T14:49:19.697685+00:00 [notice] <0.3694.0> ssl_handshake:path_validation_alert/1:2127: TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure, - {bad_cert,hostname_check_failed}
2024-05-02T14:49:19.698101996Z 2024-05-02T14:49:19.697896+00:00 [warning] <0.616.0> vmq_diversity_worker_wrapper:handle_info/2:181: Could not connect to postgresql due to {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}
2024-05-02T14:49:19.699001414Z 2024-05-02T14:49:19.697927+00:00 [error] <0.3689.0> gen_server:error_info/8:1391: Generic server <0.3689.0> terminating. Reason: {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}. Last message: {command,epgsql_cmd_connect,#{port => 5432,ssl => true,host => "hostname-removed.postgres.database.azure.com",password => #Fun<epgsql_cmd_connect.0.87005817>,database => "removed",username => "psql",ssl_opts => [{cacertfile,"/etc/ssl/certs/ca-certificates.crt"}]}}. State: {state,undefined,undefined,<<>>,undefined,on_message,undefined,{[],[]},undefined,undefined,undefined,undefined,[],information_redacted,[],undefined,undefined,undefined,undefined,undefined}. Client <0.616.0> stacktrace: [{gen,do_call,4,[{file,"gen.erl"},{line,240}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,415}]},{epgsql,call_connect,2,[{file,"/opt/vernemq/_build/default/lib/epgsql/src/epgsql.erl"},{line,207}]},{vmq_diversity_worker_wrapper,handle_info,2,[{file,"/opt/vernemq/apps/vmq_diversity/src/vmq_diversity_worker_wrapper.erl"},{line,176}]}].
2024-05-02T14:49:19.699475025Z 2024-05-02T14:49:19.698600+00:00 [error] <0.3689.0> proc_lib:crash_report/4:584: crasher: initial call: epgsql_sock:init/1, pid: <0.3689.0>, registered_name: [], exit: {{ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,1226}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.616.0>,<0.615.0>,auth_postgres,vmq_diversity_sup,<0.595.0>], message_queue_len: 0, messages: [], links: [<0.616.0>], dictionary: [], trap_exit: false, status: running, heap_size: 10958, stack_size: 28, reductions: 33545; neighbours:
2024-05-02T14:49:19.713975185Z 2024-05-02T14:49:19.713598+00:00 [notice] <0.3698.0> ssl_handshake:path_validation_alert/1:2127: TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure, - {bad_cert,hostname_check_failed}
2024-05-02T14:49:19.714218930Z 2024-05-02T14:49:19.713791+00:00 [warning] <0.620.0> vmq_diversity_worker_wrapper:handle_info/2:181: Could not connect to postgresql due to {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}
2024-05-02T14:49:19.714381329Z 2024-05-02T14:49:19.713796+00:00 [error] <0.3690.0> gen_server:error_info/8:1391: Generic server <0.3690.0> terminating. Reason: {ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}}. Last message: {command,epgsql_cmd_connect,#{port => 5432,ssl => true,host => "hostname-removed.postgres.database.azure.com",password => #Fun<epgsql_cmd_connect.0.87005817>,database => "removed",username => "psql",ssl_opts => [{cacertfile,"/etc/ssl/certs/ca-certificates.crt"}]}}. State: {state,undefined,undefined,<<>>,undefined,on_message,undefined,{[],[]},undefined,undefined,undefined,undefined,[],information_redacted,[],undefined,undefined,undefined,undefined,undefined}. Client <0.620.0> stacktrace: [{gen,do_call,4,[{file,"gen.erl"},{line,240}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,415}]},{epgsql,call_connect,2,[{file,"/opt/vernemq/_build/default/lib/epgsql/src/epgsql.erl"},{line,207}]},{vmq_diversity_worker_wrapper,handle_info,2,[{file,"/opt/vernemq/apps/vmq_diversity/src/vmq_diversity_worker_wrapper.erl"},{line,176}]}].
2024-05-02T14:49:19.714946319Z 2024-05-02T14:49:19.714359+00:00 [error] <0.3690.0> proc_lib:crash_report/4:584: crasher: initial call: epgsql_sock:init/1, pid: <0.3690.0>, registered_name: [], exit: {{ssl_negotiation_failed,{tls_alert,{handshake_failure,"TLS client: In state wait_cert at ssl_handshake.erl:2127 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,hostname_check_failed}"}}},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,1226}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.620.0>,<0.615.0>,auth_postgres,vmq_diversity_sup,<0.595.0>], message_queue_len: 0, messages: [], links: [<0.620.0>], dictionary: [], trap_exit: false, status: running, heap_size: 10958, stack_size: 28, reductions: 33544; neighbours:
Argh, now it's a verification error (the client tries to verify the peer), on the level of Erlang SSL. Need to research this but cannot do it immediately. Maybe also some sort of wildcard server name is the issue.
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
I'm now suspecting this is the same as https://github.com/vernemq/vernemq/issues/1485 that we had to fix in the MQTT bridge. Are those wildcard certs?
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@cambrosch are you still looking into this? is the public cert of the Postgres server a wildcard cert? https://en.wikipedia.org/wiki/Wildcard_certificate
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
The Certificate is using Common Name: removedhash.database.azure.com Subject Alternative Names: removedhash.database.azure.com, dev-removed-psql.postgres.database.azure.com Organization: Microsoft Corporation I don't see any wildcard, but also the common name is not the used domain name, that's only listed in alternate names.
Also>
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Ro ot CA
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
verify return:1
depth=0 C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = removedhash.database.azure.com
verify return:1
---
Certificate chain
0 s:C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = removedhash.database.azure.com
i:C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
1 s:C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
2 s:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:E d448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384: RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:ECDSA+SHA1:RSA+SHA2 24:RSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed 25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+ SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 8913 bytes and written 839 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
Protocol : TLSv1.3
Cipher : TLS_AES_256_GCM_SHA384
Session-ID: 441A89869FA67AE2B6E730907FB563C4103DA580AA1CD249445439FD6652CF19
Session-ID-ctx:
Resumption PSK: EC984194F66930E86B393A88C7E5C7EA7BC32C0D8D12743AF40E8E67285E E6F0845B1799FFCDB24AB3096D42AAF9AE5F
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 7200 (seconds)
TLS session ticket:
0000 - 33 e5 9b d1 be 3d ee 94-79 33 c0 fd 7d 7f 63 34 3....=..y3..}.c4
0010 - 62 ca 74 ab a6 bb 76 52-52 2a 6f 63 79 36 95 e1 b.t...vRR*ocy6..
Start Time: 1715759464
Timeout : 7200 (sec)
Verify return code: 0 (ok)
Extended master secret: no
Max Early Data: 0
We'll need to bite the bullet and implement more options for all plugins that need outgoing SSL.
Those are:
The reason is that OTP 26 defaults to verify_peer
for clients. Surprisingly, there's no way to configure this via application environment.
Another option would be to fall back to OTP 25.
@cambrosch one thing I wonder though: what happens when you set postgres host to an IP address instead of a name (if that's possible for your Azure env).
EDIT: just to be clear: it's of course not a bad thing to harden requirements with verify_peer
. It will require the client to have access to a CA file so that it can verify the server. But I think the hostname_check (SNI) is then also triggered by that.
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@ioolkos : I can reproduce this. Azure DB with default microsoft certificates fail as described. Using an IP didn't make any difference.
@mths1 Thanks for testing! Something like https://github.com/vernemq/vernemq/pull/2288 (untested) needed for any outgoing SSL then, to be fully OTP 26 compliant.
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@cambrosch just FYI, this should be adressed by https://github.com/vernemq/vernemq/pull/2284
π Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq π Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
Environment
Current Behavior
Running the exactly identical docker Parameters as from 1.13.0, after upgrading to 2.0.0, vmq diversity cannot connect to our postgresql server via SSL (hosted in Azure), see error in log.
A downgrade back to 1.13.0 with the same parameters fixed the issue. Validating the certificate chain using pgadmin (mode: verify-full) showed no issues with SSL.
Expected behaviour
Connecting to this sql server should not result in a validation error.
Configuration, logs, error output, etc.
Postgre-related docker environment parameters:
Postgresql server is set to: min SSL version: TLS 1.2 max SSL version TLS 1.3
Code of Conduct