sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
6.01k stars 978 forks source link

ProxySQL .NET client libraries reporting "Received an unexpected EOF or 0 bytes from the transport stream" when attempting to connect #4419

Open sleipper opened 10 months ago

sleipper commented 10 months ago

Error "Received an unexpected EOF or 0 bytes from the transport stream" when using ProxySQL with .NET. I've managed to reproduce the bug when there's a large number of async connections. We identified this through running increased parallel integration tests and an element of load testing.

Message: 
System.IO.IOException : Received an unexpected EOF or 0 bytes from the transport stream.

  Stack Trace: 
SslStream.ReceiveHandshakeFrameAsync[TIOAdapter](CancellationToken cancellationToken)
SslStream.ForceAuthenticationAsync[TIOAdapter](Boolean receiveFirst, Byte[] reAuthenticationData, CancellationToken cancellationToken)
Ssl.StartSSLAsync(Stream baseStream, Encoding encoding, String connectionString, CancellationToken cancellationToken, Boolean execAsync)
NativeDriver.OpenAsync(Boolean execAsync, CancellationToken cancellationToken)
Driver.OpenAsync(Boolean execAsync, CancellationToken cancellationToken)
Driver.CreateAsync(MySqlConnectionStringBuilder settings, Boolean execAsync, CancellationToken cancellationToken)
Driver.CreateAsync(MySqlConnectionStringBuilder settings, Boolean execAsync, CancellationToken cancellationToken)
MySqlPool.CreateNewPooledConnectionAsync(Boolean execAsync, CancellationToken cancellationToken)
MySqlPool.GetPooledConnectionAsync(Boolean execAsync, CancellationToken cancellationToken)
MySqlPool.TryToGetDriverAsync(Boolean execAsync, CancellationToken cancellationToken)
MySqlPool.GetConnectionAsync(Boolean execAsync, CancellationToken cancellationToken)
MySqlConnection.OpenAsync(Boolean execAsync, CancellationToken cancellationToken)
MySqlDataSelect.ExecuteSelectAsync(String connectionString) line 23
MySqlDataSelect.Select(String proxySQLconnectionString) line 16
ProxySqlBugTests.Select_ViaProxy_Using_MySqlData_ExpectPass() line 51

ProxySQL version 2.5.5 using your docker image: proxysql/proxysql:2.5.5.

The upload zip file has 3 tests all of which run a "select * from table" query 25,000 times using the async/TPL features of .NET. It does not have to be 25,000, I've seen it occur with 1000 or 5000 but it's intermittent nature meant it shows with that amount more reliably.

  1. Select_DirectConnection_Using_MySqlData_ExpectPass - This connects directly to the MySQL 8 server and executes the test and passes.
  2. Select_ViaProxy_Using_MySqlData_ExpectPass - This connects to ProxySQL executing the same tests as (1) but fails with the message outlined. It uses the MySQL.Data package.
  3. Select_ViaProxy_Using_MySqlConnector_ExpectPass - This connects to ProxySQL executing the same tests as (1) but fails with the message outlined. It uses the MySqlConnector package.

The zip also contains:

Thank you! ProxySqlBug.zip

Thank you!

jbirtley88 commented 7 months ago

The underlying pathology of this took some digging. All of the following need to be true:

This shows up on the Windows side as a warning in the System event log (se attached):

The remote server has requested TLS client authentication, but no suitable client certificate could be found. An anonymous connection will be attempted. This TLS connection request may succeed or fail, depending on the server's policy settings.

image-20240326-202647

I suspect that this very esoteric bug started when this was merged: https://github.com/sysown/proxysql/commit/28f09bfb7c809a94a9b09b25704bb9c3a3624723

which causes this code branch to execute: https://github.com/sysown/proxysql/blob/1f78c9e7f01f1b30afd50b13cdea9db7310a119c/src/proxy_tls.cpp#L423

and the MS client-side TLS caching is behaving very badly in the event that something about the TLS has changed (perhaps getting the tmp context insted of the global one or something - it's impossible to tell without looking at the source code)

After several days of soak-testing, I have a one-line fix for this:

--- a/src/proxy_tls.cpp
+++ b/src/proxy_tls.cpp
@@ -477,7 +477,8 @@ int ProxySQL_create_or_load_TLS(bool bootstrap, std::string& msg) {
                }
        }
        if (ret == 0) {
-               SSL_CTX_set_verify(GloVars.global.ssl_ctx, SSL_VERIFY_PEER|SSL_VERIFY_CLIENT_ONCE, callback_ssl_verify_peer);
+               // https://github.com/sysown/proxysql/issues/4419
+               SSL_CTX_set_verify(GloVars.global.ssl_ctx, SSL_VERIFY_NONE, callback_ssl_verify_peer);
        }
        X509_free(x509);
        EVP_PKEY_free(pkey);

In other words, do not perform any peer verification at all (https://www.openssl.org/docs/man1.0.2/man3/SSL_CTX_set_verify.html).

This is benign (currently) because of the current implementation of callback_ssl_verify_peer(): https://github.com/sysown/proxysql/blob/1f78c9e7f01f1b30afd50b13cdea9db7310a119c/src/proxy_tls.cpp#L70

int callback_ssl_verify_peer(int ok, X509_STORE_CTX* ctx) {
        // for now only return 1
        return 1;
}

This fix has withstood 48-hours of soak-testing under significant load (ubuntu 20.04 and ubuntu 22.04).

I'll raise a PR