pingidentity / ldapsdk

UnboundID LDAP SDK for Java
Other
327 stars 79 forks source link

Server down on freshly estabilshed connection #131

Open david0 opened 2 years ago

david0 commented 2 years ago

Hi, we get the following error on an connection that has been freshly connected before:

Caused by: LDAPException(resultCode=81 (server down), errorMessage='The connection is not established.', ldapSDKVersion=5.0.0-verboselogging-SNAPSHOT, revision=286d9a21386aee242dfaa6cc1052a606f11dedc8)
    at com.unboundid.ldap.sdk.LDAPConnection.registerResponseAcceptor(LDAPConnection.java:4725)
    at com.unboundid.ldap.sdk.SimpleBindRequest.process(SimpleBindRequest.java:561)
    at com.unboundid.ldap.sdk.LDAPConnection.processBindOperation(LDAPConnection.java:4400)
    at com.unboundid.ldap.sdk.LDAPConnection.bind(LDAPConnection.java:2311)
    at com.unboundid.ldap.sdk.LDAPConnection.bind(LDAPConnection.java:2274)

This is part of our tests where we switch connections a lot between two servers. The LDAPConnection was used before on a completely different target system.

Looking at the logs, to me there seems to be a race condition that the old LDAPConnectionReader is shut down after the new connection was established (I added some additional logs):


[07:22:28,457]  INFO - Change LDAP operations connection to runner:39807

...
(LDAPConnection.connect)
INFO: { "timestamp":"2022-06-14T07:22:28.458Z", "debug-type":"connect", "level":"INFO", "thread-id":1, "thread-name":"Test worker", "connection-id":2, "disconnected-from-address":"runner", "disconnected-from-port":58875, "disconnect-type":"RECONNECT", "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }
WARNING: { "timestamp":"2022-06-14T07:22:28.459Z", "debug-type":"connect", "level":"WARNING", "thread-id":915, "thread-name":"Background connect thread for runner/172.17.0.1:39807", "message":"Connecting to 39807", "caught-exception":{ "message":"Connecting", "stack-trace":[ "run(ConnectThread.java:165)" ] }, "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }
WARNING: { "timestamp":"2022-06-14T07:22:28.458Z", "debug-type":"connect", "level":"WARNING", "thread-id":914, "thread-name":"Connection reader for connection 2 to runner:58875 (closed)", "message":"Closing connection to 58875", "caught-exception":{ "message":"Closing connection to 58875", "stack-trace":[ "logClosingConnection(Debug.java:1651)", "close(LDAPConnectionInternals.java:697)", "setClosed(LDAPConnection.java:4681)", "closeInternal(LDAPConnectionReader.java:1017)", "run(LDAPConnectionReader.java:419)" ] }, "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }
java.lang.Exception: Closing connection to 58875
    at com.unboundid.util.Debug.logClosingConnection(Debug.java:1651)
    at com.unboundid.ldap.sdk.LDAPConnectionInternals.close(LDAPConnectionInternals.java:697)
    at com.unboundid.ldap.sdk.LDAPConnection.setClosed(LDAPConnection.java:4681)
    at com.unboundid.ldap.sdk.LDAPConnectionReader.closeInternal(LDAPConnectionReader.java:1017)
    at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:419)

INFO: { "timestamp":"2022-06-14T07:22:28.459Z", "debug-type":"connect", "level":"INFO", "thread-id":1, "thread-name":"Test worker", "connected-to-address":"runner", "connected-to-port":39807, "connection-id":2, "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }

INFO: { "timestamp":"2022-06-14T07:22:28.460Z", "debug-type":"connect", "level":"INFO", "thread-id":1, "thread-name":"Test worker", "message":"Setting the SO_TIMEOUT value for connection LDAPConnection(not connected) to 300000ms.", "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }

INFO: { "timestamp":"2022-06-14T07:22:28.460Z", "debug-type":"connect", "level":"INFO", "thread-id":914, "thread-name":"Connection reader for connection 2 to runner:58875 (closed)", "connection-id":2, "disconnected-from-address":"runner", "disconnected-from-port":58875, "disconnect-type":"RECONNECT", "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }

[07:22:28,461] (LDAPConnection.bind fails with error 81, error shown)

Imho it is not to be guaranteed that LDAPConnectionReader.close will always wait for the thread shut down. If the shut down takes longer than 500ms, it will return anyway which could lead to this situation I suppose (That seems to be not the case here, but there at least it proves the point).

dirmgr commented 2 years ago

I'm currently looking into this. I have thus far not been able to see the problem for myself. Do you have a reproducible test case that seems to work?

As an alternative, you could try using the LDAP SDK in synchronous mode rather than the default asynchronous mode. When run in synchronous mode, it doesn't maintain a separate listener thread per connection, which actually yields slightly better performance and less overhead, but you can't process asynchronous operations or use the same connection for multiple concurrent operations on separate threads. To use synchronous mode, create an LDAPConnectionOptions object, call setUseSynchronousMode(true) on that object, and then pass that object as an argument when creating the LDAPConnection object.

david0 commented 2 years ago

I was not yet successful reproducing that problem with a smaller test.

I changed our code to always use a fresh LDAPConnection and -since then- it is stable.

Will it help you if I try to use synchronous mode? We are using async operations at some places and using new LDAPConnection objects all the time is an acceptable workaround. So I would only dare to try sync mode if that helps you somehow.

dirmgr commented 2 years ago

Thanks for the update. It may well have something to do with the use of asynchronous operations, especially if there are still any outstanding operations in progress when attempting to close the connection. I'll continue looking into this.

For now, your approach of simply creating a new connection object is probably the best one to use. There's no need to test with synchronous mode if you need asynchronous operations.

dirmgr commented 2 years ago

When testing with asynchronous operations, I was able to reproduce an occasional failure every few thousand attempts or so. I've just committed a couple of changes, and with those changes in place, I've not been able to reproduce that failure.

But if you've got the changes in place to use new connections rather than to keep using the same connection object, I'd recommend keeping the new behavior.

david0 commented 2 years ago

I tested your commit from master with out old code again, unfortunately still I get the server down.

Looking deeper at our code I can now say that we only use async*-operations at one special place with a dedicated connection. Its does not share LDAPConnection objects with the place where we get the server downs.

totaljfb commented 2 years ago

I got the same error message and this is 100% replicable in my test case. LDAPException(resultCode=81 (server down), numEntries=0, numReferences=0, errorMessage='The connection is not established.', ldapSDKVersion=6.0.5, revision=b50104702e7cc07334a9c64953a91f3e6dbdb27c')

The workaround is to establish a new LDAP connection, then everything works.

@dirmgr I may have a chance to verify it if you can let me know how to set LDAP connection in synchronous mode.

dirmgr commented 2 years ago

@totaljfb Could you describe your test case? Is it something that could be converted into a standalone program for testing?

To use the LDAP SDK in synchronous mode, you need to use the LDAPConnectionOptions.setUseSynchronousMode method, and then provide that LDAPConnectionOptions object to the connection when creating it (or at least set it before the connection is established). For example:

LDAPConnectionOptions options = new LDAPConnectionOptions();
options.setUseSynchronousMode(true);

LDAPConnection conn = new LDAPConnection(
     sslUtil.createSSLSocketFactory(),
     options, host, port);
totaljfb commented 2 years ago

@dirmgr Thank you.

In my test case, the LDAPConnection is created at the beginning of the process, then it gets some user groups info from the LDAP Server, then it executes some other tasks and then uses the same connection object to retrieve some user info from the LDAP Server. It calls the LDAPConnection search funtion and errors happens right after that line.

Unfortunately the I cannot replicate the issue today, everything is working. I will see if have another chance to test it in the furture. I have updated the code to use a new connection though.

Thank you for your help!

dirmgr commented 2 years ago

@totaljfb are you intentionally closing and re-establishing the connection at any point in the process, or are you just trying to continue using a connection that had previously been used and finding that it is no longer valid? If it's the latter, then that could be attributed to a few other things, like a networking issue or the server closing the connection after being idle for too long or after a maximum connection duration had been reached.

In any case, I would actually recommend using a connection pool rather than an individual connection. Even if you only need one connection, using a connection pool can provide substantial benefits. Unlike a lot of APIs, a connection pool in the UnboundID LDAP SDK for Java can be used like an individual connection (for example, you can just call pool.search to perform a search on a pooled connection rather than manually checking out a connnection, issuing a search, and releasing the connection back to the pool). Doing this has several benefits, including:

totaljfb commented 2 years ago

@dirmgr The answer to your question, it's the latter. I just wanted to reuse the LDAPConenction object which was created at the beginning of the process. Your analysis of the root cause makes sense to me. And, thanks for the suggestion for using connection pool as a best practice, I appreciate it!