Open MichalCho opened 5 years ago
The write stack trace shows that the LDAP SDK is blocked on a call to OutputStream.write, which indicates that the socket's send buffer is full. This generally means one of two things:
The stack trace from the connection reader shows that it's waiting for any data to be received from the server. There's really nothing more that can be determined from that. All we know is that, at the instant you got the stack trace, it wasn't actively receiving any data or processing any data that it had received.
When the LDAP SDK sends data, it does use the SO_TIMEOUT socket option to place a limit on how long the operation should be allowed to block. The value set for the SO_TIMEOUT socket option comes from the response timeout for the request that you're sending. By default, it comes from LDAPConnectionOptions.getResponseTimeoutMillis(OperationType), but you can override it on a per-request basis by calling the request's setResponseTimeoutMilis method. If an attempt to write to the socket output stream takes longer than this, then the JVM should throw a SocketTimeoutException, which is a type of IOException. The LDAP SDK will catch that and throw an LDAPException with a SERVER_DOWN result code and a message like "An error occurred while attempting to send the LDAP message to server {host:port}: {cause}".
Hi Neil, as usual thanks a lot for a very fast answer. Unfortunately it does not seems to work like this. In case of regular request timeout I receive a timeout exception. But in a case where all my threads are BLOCKED in
at com.unboundid.util.ByteStringBuffer.write(ByteStringBuffer.java:1661)
no exception is thrown.
You're right. Unfortunately, it looks like SO_TIMEOUT only applies to reads and not to writes, and it doesn't look like Java provides a way for a call to SocketOutputStream.write to time out. It would be necessary to switch to NIO for that, which would be a complete overhaul of the LDAP SDK.
I think that it should be possible to work around this with a custom output stream that uses a background thread to close a connection if it's been blocked on a write attempt for too long, but it'll take some doing.
I've just committed an update that changes the way that the LDAP SDK guards against indefinitely blocked writes. It should be included in the next release, but you can use it now if you check out and build the LDAP SDK for yourself.
Hi Neil, thanks a lot for your commit. I've checked it and it helped. Unfortunately I've found another issue - in case when a ldap server for example gets killed or a network connection gets lost -and tcp keepalive settings are set to a big values - it may lead to deadlock anyway - when all connections try to write and timer thread can not flush
Thread: WriteWithTimeoutHandler Timer BLOCKED
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at com.unboundid.util.WriteWithTimeoutTimerTask.run(WriteWithTimeoutTimerTask.java:74)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
and
Thread: pool-8-thread-2 BLOCKED
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:117)
at com.unboundid.util.WriteWithTimeout.write(WriteWithTimeout.java:166)
at com.unboundid.util.ByteStringBuffer.write(ByteStringBuffer.java:1689)
at com.unboundid.asn1.ASN1Buffer.writeTo(ASN1Buffer.java:1043)
at com.unboundid.ldap.sdk.LDAPConnectionInternals.sendMessage(LDAPConnectionInternals.java:562)
at com.unboundid.ldap.sdk.LDAPConnection.sendMessage(LDAPConnection.java:4452)
at com.unboundid.ldap.sdk.SearchRequest.processAsync(SearchRequest.java:1413)
at com.unboundid.ldap.sdk.LDAPConnection.asyncSearch(LDAPConnection.java:4255)
I've played with
net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_retries2
it helps - if a socket timeouts before its buffer get full then it's correctly handled. Unfortunately we can not play with those parameters too much
Sorry it took a while to get to this, but I just committed a second change that I think should address this. It now attempts to close the socket rather than the output stream (which may be wrapped in another type of output stream, like a BufferedStream, that could attempt to flush buffered as part of the close).
Hi Neil,
thank you! That was exactly my idea - to try to close sockets. I was trying to implement it myself but you were faster. We will test it and provide you a feedback. Br Michal
Hi! I have observed a disturbing issue and I wonder if it is related to the way how I use ldapsdk or there is a bug in the connection handling.
In my application I use a connection pool that is shared among many threads. When i receive a new input data I checkout connection from the pool and perform an asynch operation. I also store it in my own queue so I can use it again before the answer arrives - as I return connection to the pool after receiving a response to my requests (when ActiveOperation counter is <1). However during network problem (some packets lost) I can see that all connections are stuck with
while listeners are up and ok:
Is there a way to abort the sending the request in case it takes too much time? As I can see that send-q in netstat are being full while recv-q are empty- it means that sending requests is not really working.
Thanks a lot for your answer. Br Michal