pingidentity / ldapsdk

UnboundID LDAP SDK for Java
Other
327 stars 79 forks source link

doStartTLS() occasionally fails due to closed connection #126

Open githubert opened 2 years ago

githubert commented 2 years ago

Summary

When trying to connect to an instance of InMemoryDirectoryServer, using STARTTLS with host name verification, sometimes an exception is thrown indicating that SO_TIMEOUT could not be set because the connection is closed.

The circumstances seem a bit special, since this came from a test that checks if the connection via STARTTLS fails with a host name verification error. It did not seem to happen with TLS. If the host name verifier is not present, the problem also isn't present.

The error message reads as follows:

LDAPException(resultCode=82 (local error), errorMessage='An error occurred while attempting to set the value of the SO_TIMEOUT socket option for connection LDAPConnection(connected to localhost:39679) to 50ms:  SocketException(Socket is closed), ldapSDKVersion=6.0.4, revision=6580b9a0e188fedfe4c67776aabf1f0b61c06a28')

It seems to stem from the second call in the finally of the else block of LDAPConnectionReader.doStartTLS(), which tries to set the socket timeout back to the original value. That is as far as I could check, I didn't see the first call (InternalSDKHelper.setSoTimeout(connection, 50)) failing, but my sample size wasn't that big.

Affected Version

SDK version: 6.0.4

Tests were done with OpenJDK 11.0.14.1 (2022-02-08).

Expected outcome

I'd expect that, no matter how often the test is run, it always results in the message Hostname verification failed because the expected hostname 'localhost' was not found in peer certificate 'subject='CN=foo.example.com.

How to reproduce

The test depends on a certificate that is valid in itself, but presents the wrong host. The connection is made to "localhost". The following Bash script can be used to generate the key store for this test in the current folder (on Linux).

PASSWORD="changeit"
ALIAS="test"

generate() {
    BASENAME="$1"
    SUBJECT="$2"
    DAYS="$3"

    openssl req -new -subj "$SUBJECT" -newkey rsa:2048 -keyout "$BASENAME".key -out "$BASENAME".crt -nodes -x509 -days "$DAYS"
    openssl pkcs12 -export -inkey "$BASENAME".key -in "$BASENAME".crt -name test -out "$BASENAME".p12 -passout pass:"$PASSWORD"
    keytool -importkeystore -srckeystore "$BASENAME".p12 -srcstoretype pkcs12 -srcstorepass "$PASSWORD" -srcalias "$ALIAS" -destkeystore "$BASENAME".jks -deststoretype jks -deststorepass "$PASSWORD" -destalias "$ALIAS"
}

generate foo.example.com '/CN=foo.example.com' 28

There are multiple resulting files matching foo.example.com.*, of which foo.example.com.jks needs to be placed in /tmp (or the test needs to be amended).

import com.unboundid.ldap.listener.InMemoryDirectoryServer;
import com.unboundid.ldap.listener.InMemoryDirectoryServerConfig;
import com.unboundid.ldap.listener.InMemoryListenerConfig;
import com.unboundid.ldap.sdk.LDAPConnectionOptions;
import com.unboundid.ldap.sdk.LDAPException;
import com.unboundid.ldap.sdk.SingleServerSet;
import com.unboundid.ldap.sdk.StartTLSPostConnectProcessor;
import com.unboundid.util.ssl.HostNameSSLSocketVerifier;
import com.unboundid.util.ssl.KeyStoreKeyManager;
import com.unboundid.util.ssl.SSLUtil;
import com.unboundid.util.ssl.TrustStoreTrustManager;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

import java.io.File;
import java.security.GeneralSecurityException;

import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;

public class SocketTimeoutTest {
    private static String pathToCustomTruststore = "/tmp/foo.example.com.jks";
    private InMemoryDirectoryServer directoryServer;

    @BeforeEach
    void setUp() throws LDAPException, GeneralSecurityException {
        SSLUtil sslUtilServer = new SSLUtil(new KeyStoreKeyManager(new File(pathToCustomTruststore),
                "changeit".toCharArray(),   "JKS", "test"), null);

        InMemoryDirectoryServerConfig config = new InMemoryDirectoryServerConfig("dc=example,dc=com");

        config.setListenerConfigs(
                InMemoryListenerConfig.createLDAPConfig("STARTTLS", null, 0, sslUtilServer.createSSLSocketFactory())
        );

        directoryServer = new InMemoryDirectoryServer(config);
        directoryServer.startListening();
    }

    @AfterEach
    void tearDown() {
        directoryServer.shutDown(true);
    }

    @Test
    void setSocketTimeOutFailure() {
        for (int i = 0; i < 25; i++) {
            LDAPException e = assertThrows(LDAPException.class, this::getConnection);

            System.out.println(e.getExceptionMessage());

            assertTrue(e.getExceptionMessage().contains("verification failed"));
        }
    }

    private void getConnection() throws GeneralSecurityException, LDAPException {
        final var customTrustStore = new File(pathToCustomTruststore);
        final var sslUtilClient = new SSLUtil(new TrustStoreTrustManager(customTrustStore));
        final var ctx = sslUtilClient.createSSLContext();

        // Configuration
        LDAPConnectionOptions options = new LDAPConnectionOptions();
        options.setSSLSocketVerifier(new HostNameSSLSocketVerifier(true));

        // ServerSet
        var serverSet = new SingleServerSet("localhost", directoryServer.getListenPort(), null, options, null,
                new StartTLSPostConnectProcessor(ctx));

        serverSet.getConnection().close();
    }
}

Currently the test runs this 25 times in rapid succession. It usually fails within the first ten tries, but since this is probably some race condition, it may vary with machine speed. I hope it is reproducible for others!

dirmgr commented 2 years ago

Thank you for reporting this. I've been able to reproduce the problem in my own test, and I've found the underlying cause.

When you're using a connection in non-synchronous mode (which is the mode it uses by default), then it sets a short SO_TIMEOUT on the connection when beginning StartTLS processing, and then resets it after StartTLS processing completes. The shorter SO_TIMEOUT is set in a try block with all of the other processing, and the SO_TIMEOUT is reset in a finally block that is attached to that try block.

Because you had the hostname verifier in place, and because the hostname you're using doesn't match the expected value, then TLS negotiation is failing, and there's an exception being thrown in that try block. The failed negotiation also initiates the process of closing the connection, and in some cases, the closure has happened before the finally block gets around to trying to resetting the SO_TIMEOUT, which causes that reset attempt to fail, and that throws another exception that hides the exception that was originally thrown as a result of the TLS negotiation.

I've committed a change to address the problem by not attempting to reset the SO_TIMEOUT if there was a failure in StartTLS processing. Before this change, my test program ran into the problem you described after only a few attempts. With the fix in place, I've successfully completed a couple of 10,000-pass runs without the SO_TIMEOUT issue raising its head.

The fix will be included in the next release of the LDAP SDK, which is currently scheduled for sometime around June. You can get the fix sooner than that if you check out and build the LDAP SDK for yourself.

As a workaround, you can try using the LDAP SDK in synchronous mode, by using LDAPConnectionOptions.setUseSynchronousMode(true) before establishing the connection. Connections that operate in synchronous mode send requests and wait for responses on the same thread, which eliminates the need for background reader threads. Connections in synchronous mode tend to be slightly faster and have a little less overhead, but you can't use asynchronous operations or abandon/cancel operations in progress. If you don't need support for asynchronous operations or abandons/cancels, then synchronous mode may be a better option than the default asynchronous mode.

githubert commented 2 years ago

Nice! Many thanks for the details and suggestions, I'm going to check if synchronous mode works for us, but I think it might :)