Open mar1u50 opened 2 years ago
It seems unlikely that the StartTLS change you pointed out is responsible for the problem, as that only affects client connections that the LDAP SDK establishes and wouldn't apply to incoming connections accepted by the in-memory directory server.
However, commit 200248cae936aa5761997aff6e5b31459b5eef4e does directly relate to support for TLSv1.3, and it would apply to both client and server usage. But the only real meaningful changes in that commit were to use a default protocol of TLSv1.3 when it's supported by the JVM. The changes in that commit include:
So really, the only change that should affect anything is changing the default protocol to TLSv1.3 and the set of enabled protocols to include TLSv1.3 if the JVM indicates that is supports it.
Note that in the commit message, I did note that my testing encountered an OpenJDK issue that could arise under certain conditions, but that should be fixed in a later JDK release. I know that at with earlier Java 11 builds, there were definitely JVM issues with TLSv1.3. Are you testing with an up to date JVM?
The LDAP SDK gets very heavy use with TLSv1.3, and I've not heard of anyone else having timeout problems that turned out to be issues in the LDAP SDK itself. We have occasionally seen issues in which TLS negotiation can take a long time to complete on certain systems, but that has almost always turned out to be entropy exhaustion that causes the JVM to block while waiting on the secure random number generator. This is especially likely in a virtual machine or container that might not have access to the underlying system's entropy pool, and the best ways to address it have been to either reconfigure the JVM to use /dev/urandom instead of /dev/random (at least on Linux systems; it prevents blocking and contrary to popular opinion does not actually introduce any meaningful reduction in security) or to use an entropy-augmenting daemon like rngd.
Since you indicate that you can consistently reproduce the problem, is there any way that you could provide me with code or instructions that I could use to try to observe the problem for myself?
In the meantime, you could try disabling support for TLSv1.3 by calling SSLUtil.setDefaultSSLProtocol("TLSv1.2") and SSLUtil.setEnabledSSLProtocols(Collections.singleton("TLSv1.2")) and see if that causes the problem to go away. That should basically eliminate the effect of the changes in the commit that I referenced, since LDAP SDK versions prior to 4.0.9 wouldn't enable TLSv1.3 by default. But ultimately, the LDAP SDK relies entirely on the JVM's support for TLS, so if there actually is an issue, then it may be in the JVM itself.
If I set SSLUtil.setEnabledSSLProtocols(Collections.singleton("TLSv1.2")) the problem is disappearing.
If that's the case, then it's almost certainly related to the JVM's implementation of TLSv1.3 rather than something in the LDAP SDK itself. The LDAP SDK doesn't implement TLS logic itself, but instead relies entirely on the underlying JVM. The only real difference is in the TLS protocol string that the LDAP SDK uses when obtaining an SSLContext, and in the set of enabled TLS protocol versions when creating an SSLSocket or SSLServerSocket.
I'd recommend making sure that you're on the latest release of the JVM for whatever version of Java you're using. For example, as I noted above, there are known issues with TLSv1.3 in early versions of Java 11, so if you're on Java 11, then make sure you're using the latest release of the Java 11 VM. If you're using some other Java version and there's a newer release of the JVM available for that version, then try it.
I'm using the latest java 17.0.2.8.1 (corretto)
I have a test that sporadically fails (about 1 in 100 runs). My application uses apache directory api but for testing I spawn an InMemoryDirectoryServer.
The test connects to an InMemoryDirectoryServer, uses StartTLS does a bind and then disconnects from LDAP.
The test never failed when I used com.unboundid:unboundid-ldapsdk:4.0.6.
I upgraded to com.unboundid:unboundid-ldapsdk:6.0.3 and it started to fail as explained.
I execute the test 1000 times in a loop now to make the problem more reproducible and it fails every time.
I tried to identify the version that introduced the problem and it seems to be 4.0.9.
The failure is:
I looked at the commits between 4.0.9 and 4.0.10 and the only one that seems relevant for startTLS seems to be https://github.com/pingidentity/ldapsdk/commit/9a19a281039942e79cc994d14923f3cedd6106b0