rabbitmq / rabbitmq-website

RabbitMQ website
https://www.rabbitmq.com
Apache License 2.0
837 stars 727 forks source link

TLS troubleshooting guide: document a common widely known case leading to unsupported_record_type X errors #1333

Open acammack opened 2 years ago

acammack commented 2 years ago

This probably belongs in some sort of operations document besides the (already quite long) existing site/ssl.md page.

When connecting an Erlang TLS client or server to an un-encrypted server or client, instead of saying that the other side is probably when un-encrypted when the handshake is invalid, it generates stacktrace and a SASL error in the RabbitMQ logs like:

TLS server: In state hello at tls_record.erl:564 generated SERVER ALERT: Fatal - Unexpected Message
 - {unsupported_record_type,65}

Errors from an outbound TLS connection generate CLIENT ALERT messages instead of SERVER ALERT above, but the format is otherwise the same. Packet capturing is the best way to diagnose which ports are not behaving as expected, but there is also a clue in the error message itself. The number given for the unsupported_record_type is the decimal value of the first byte received over the connection, which for a TLS connection would be message ("record") type identifier. For the example above, 65 is the ASCII letter A, which is the first byte of the AMQP protocol, indicating that the incoming connection is likely an AMQP client that does not have TLS enabled. Another common code is 71 for the ASCII letter G, indicating a likely HTTP GET request.

These sorts of errors are somewhat expected on internet-facing RabbitMQ installations due to prevalence of internet scanning tools that inconsistently use TLS even on common TLS ports. If none of the expected clients are having issues connecting, the errors can be ignored and do not reflect an issue with the RabbitMQ server itself (though you may want to block noisy IP addresses).

michaelklishin commented 2 years ago

We have a guide on Troubleshooting TLS, so these would be a good fit there.

gsmith-sas commented 1 year ago

In my environment (Kubernetes running RabbitMQ version 3.10), these log messages are emitted with a log level of "notice" which is not one of the documented logging levels. I'm not sure if that's an issue with the log message itself (should it be one of the documented levels?) or the documentation (should 'notice' be a documented level?). Given the explanation above, should we consider "notice" as comparable to "debug"?

michaelklishin commented 1 year ago

Consider using GitHub Discussions for questions.

Those messages are logged by Erlang's TLS implementation, not RabbitMQ. RabbitMQ itself very rarely if ever logs notice messages.

notice sits between warning and info on the list of log severity levels supported by the runtime logger.

danielschnetler commented 1 year ago

When connecting to a TLS endpoint from non TLS enabled client I also received this error. Try to do something like this if you are using a C# client:

var factory = new ConnectionFactory() { 
    HostName = "hostname", 
    UserName="user", 
    Password="password", 
    VirtualHost="/",
    Port = 5671,
    Ssl = new SslOption
            {
                Enabled = true,
                ServerName = "hostname"
            }
 };
michaelklishin commented 1 month ago

This issue was never meant to be a support forum.

michaelklishin commented 1 month ago

There is only one place in tls_record where this message is used.

It is a generic catch-all function clause. This means the cardinality of potential reasons is high. Some cases are well known, the most common being a non-TLS ("plain TCP") client connects to a TLS-enabled port and sends traffic other than TLS handshake.

Therefore the scope for potential improvements to the docs seems to me much more limited than it may seem at first. Of course, even documenting the most common case is worth doing.