Closed romanblachman closed 4 years ago
The LDAP SDK doesn't really operate in terms of packets. It treats each connection as a continuous stream. As such, whether or not something is broken up across multiple packets is irrelevant. It could come all in one packet, in a separate packet per byte, or something in between, and it wouldn't really affect the way that the LDAP SDK consumes it.
In this case, the error message is because the LDAP SDK read the header for a BER set indicating that the combined length for all of the elements in the set was 79 bytes. It then read an element with 61 bytes of data (and presumably one byte for the BER type and one byte for the length, for a total of 63 bytes). That's less than 79 bytes, so there must be at least one more element in the set. Then, it read the next element, which had 51 bytes of data (and again, two more bytes for the type and length). So the LDAP SDK read 116 bytes of elements within the set when it only expected there to be 79 bytes. It can't recover from that, so it threw an exception and terminated the connection.
From looking at the data, it is clear that somehow the data that the client read got out of alignment with what the server was sending. The data is in the clear, but it does include GSSAPI integrity checking. We are about 40 megabytes into the stream, though, and there's no indication as to how long ago the bind was that created the integrity session. If it happened near the beginning of the connection, then it seems unusual that it would have gotten this long before running into problems. If it just happened, then it seems weird that it would have gotten that far into the connection before performing the bind that would have started the integrity session. In either case, it may be that we need more data to investigate the issue. I'll look into it on my side, but it may be a few days before I can dig into it in detail.
Thanks for the quick response!
If I understand this correctly, the ASN.1 is streamed on-top of SASL messages (due to integrity enabled) meaning that SASL messages are discrete in size and not streamed. This means that if the SASL buffer ends in the middle of the ASN.1 a new SASL header will appear out of nowhere with the new size for the next SASL message and then the rest of the ASN.1 data will appear:
The SASL header may appear anywhere inside the ASN.1 message if the ASN.1 message is longer than the SASL size (which appears in the first header).
In the attached image, you can see ASN.1 message starting (30 84 0000065d 02 01 5c) and then out of nowhere SASL header appears (0000ffb3050405ff000c000c000000004b255e026f7a1a7053e22f8677333058) and in the new SASL message the ASN.1 stream continues (64 84 00000654...)
I have the PCAP attached with the issue reproduced: https://drive.google.com/file/d/1dcoi_ppHl9d2RZxUqNT3qVyLyxA7TDbO/view?usp=sharing
Sorry for taking a while to get back to you on this, but I've been unsuccessful in reproducing the problem. I'm not an expert with Active Directory by any means, and the only instance I have access to doesn't seem to be set up to handle GSSAPI authentication properly. I'm able to use the LDAP SDK to get a Kerberos ticket as the user that's trying to authenticate, but then AD is returning an error in response to the GSSAPI bind attempt with an indication that it's not finding a principal in the KDC. I get this error regardless of the quality of protection, so it's not related to an attempt to use integrity checking.
Rather than continuing to bash my head against AD, I decided to switch to a different non-Ping/UnboundID server, so I chose OpenLDAP. After some effort, I was able to get an OpenLDAP instance up and running and accepting GSSAPI authentication using SASL integrity. I've had a test running successfully with hundreds of millions of requests over connections protected with SASL integrity and have not encountered any errors.
You are correct in that the SASL integrity processing is injected between the TCP transport and the LDAP communication. Basically, whenever either side wants to send data to the other side, it uses the previously negotiated settings to encode the data in some way (for integrity checking, this is basically putting a signature before the clear-text data; for confidentiality, it's encrypting the entire set of data). Once it has that encoded data, it first writes four bytes with the two's complement encoding of the number of bytes in that encoded blob (for example, in the packet capture that you provided, the 16th packet is the first one that has SASL integrity protection, and the "00 00 00 bf" sequence in that packet says that there are 191 bytes of SASL data), and then it follows that with the encoded data. On the other end of the connection, it first reads those four bytes to figure out how many bytes of encoded data there are, then it reads that many bytes of encoded data, and then it uses the negotiated settings to unwrap it and end up with the clear-text data.
In the LDAP SDK, this is handled in two places:
For writing sending requests to the server, it happens in LDAPConnectionInternals.sendMessage. We have the LDAP message to write, and if we don't have a SaslClient instance, then we'll just write the bytes of that message to the socket as-is. But if we do have a SaslClient, then we first use its wrap method to encode the data to be written. We then write a four-byte representation of the number of bytes of that encoded data, and finally the encoded request itself.
For reading responses from the server, it happens in ASN1StreamReader.readAndDecodeSASLData. We read four bytes of data to figure out how long the encoded blob is, then we read that many bytes of data, and then we use the SaslClient.unwrap method to get the clear-text LDAP data. Elsewhere in the class, we do differentiate between whether we're reading data directly from a socket or from a buffer holding SASL-decoded data, but that's primarily done in the read methods, and I've been through that code without seeing an obvious problem.
Ultimately, the LDAP SDK isn't really doing packet-based reading. It's using blocking sockets to read the amount of data that it needs, and it doesn't care whether it's split into multiple packets. It also doesn't care whether a single LDAP message (or an ASN.1 element within a message) is split across multiple SASL wrapped messages. I've done some additional testing with really big entries (both an entry with a ton of attributes, and an entry with an attribute with a huge value) and verified that the SDK can handle big messages split into lots of SASL integrity chunks.
At this point, it's tough to say what the problem might be. The packet capture you provided is helpful, but it's hard to use in any meaningful way without the keys. I may be able to do a bit of analysis from the capture, but it's hard to say.
The easiest solution at this time would probably be to just switch to using authentication only from GSSAPI, and rely on TLS if you want integrity or confidentiality.
I did a little more testing around this by creating a very simple proxy that breaks up packets into small chunks, and I've run it for a long duration with both a fixed size of one byte (in which each byte of the request and response is sent in its own packet), and with random packet sizes between 1 and 5 bytes. Note that this chunking was performed after SASL integrity processing had been performed, so that means that the SASL integrity data and also the wapped data length bytes were split up across multiple packets. I didn't encounter any issues at all over the course of this testing, so I'm really not seeing any evidence that there's a problem in the LDAP SDK's handling of communication protected with SASL integrity or confidentiality.
I can try to create a standalone code that reproduces the issue against our LDAP server and then share the server IP and credentials with you (requires Kerberos ticket so might be a bit tricky). We can also jump on a call where I can demo it and see how I can help reproduce the issue for further investigation.
Meanwhile, I did notice something interesting in the traffic captures. When running exactly the same LDAP query with the ldapsearch utility (that had no errors) I’ve noticed that the SASL message size that returns from the LDAP server is different between than the one in the ldapsdk.
The ldapsdk SASL length is always about 65K while the ones that show in the ldapsearch traffic capture are much higher since it’s a DWORD and it’s not bound to the TCP max packet size. Do you know if the SASL length is negotiated or configured somewhere?
Again, thank you very much for the investigation. I will do my best to help figure out why GSSAPI integrity with MS Active Directory fails. Switching to LDAPS is an option, but I assume this issue is still relevant since if MS LDAP Signing is enforced GSSAPI over LDAPS will still be required in secure enterprise environments.
Hello again, we are still struggling with the problem that keeps reproduces consistently on one of our development domains. We can expose the relevant DC LDAP to the internet, will it help you with finding a possible solution?
It looks like in January 2020 Microsoft will enforce LDAP signing by default (https://support.microsoft.com/en-us/help/4520412/2020-ldap-channel-binding-and-ldap-signing-requirement-for-windows) and we want to provide our customers the option to use LDAP Signing with Active Directory using the ldapsdk.
If you wish to talk further offline please reach out @ rblachman@preempt.com
Thank you!
Unfortunately, I'm not sure that I'll be able to make any additional progress on this. I don't have access to an Active Directory instance that I can use for this testing, and all of my testing with other types of directory servers has not identified any problems.
Hi @dirmgr, we can provide you with access to an Active Directory server where the problem reproduces persistently. Would that help?
I’m sorry to say that at present, we do not have the time or resources to devote to investigating this issue. I have created an internal issue to track it, and may be able to give it some attention in the future.
I did spend a considerable amount of time testing with OpenLDAP under various conditions, and I was not able to reproduce the problem that you describe. Further, the LDAP SDK does not implement its own support for GSSAPI, but instead relies on the GSSAPI support in the underlying JVM. It is possible that the issue may lie there, or that there may be a problem in the way that Active Directory has implemented its support. Have you tried creating a test program with another Java-based API (especially JNDI, as it would definitely rely on the same underlying JVM support) to determine whether the problem is evident there?
I’m sorry that we’re not able to provide more assistance on this at the moment, but I’m simply not in a position to devote the time and effort that may be required to investigate it.
Thank you very much @dirmgr. We will investigate this further and update here with our findings. Your help is highly appreciated.
@dirmgr I believe to have found the bug.
The problem happens when ASN1StreamReader.peek() is called at the end of a SASL buffer. Please see the problematic code - I have added inline comments to explain.
public int peek()
throws IOException
{
final InputStream is;
if (saslClient == null)
{
is = inputStream;
}
else
{
if ((saslInputStream == null)) // || (saslInputStream.available() == 0) is the fix I used in my lab
{
readAndDecodeSASLData(-1);
}
is = saslInputStream; // saslInputStream is loaded to is
}
is.mark(1); // Point in buffer is marked
final int byteRead = read(true); // Read is called just before new SASL frame. New buffer put in saslInputStream
is.reset(); // reset to marked point is called on the old buffer. The new buffer remains one byte into its reading, causing the unwanted shift
return byteRead;
}
Please confirm whether I am correct.
This definitely looks promising. I'll need to look into it more carefully, and I probably won't be able to do that until next week, but I think that you're probably right.
After taking a more thorough look at it, I do think that you've identified the problem. I went ahead and committed a change that is very similar to the one that you proposed. Let me know whether you can confirm that this indeed fixes the problem.
Thanks for your help with this!
Thanks @dirmgr! This works well.
When can we expect this fix to make it into a release? We would rather keep on getting the package from Gradle.
Unfortunately, it just missed last week's 4.0.13 release. I'm not sure when the next one will be, but it'll probably be at least a couple of months.
@borisdan awesome job finding the issue!
@dirmgr Thank you for merging to master! Expect that when Microsoft switches to LDAP Signing enforcement by default starting from Jan. 2020, applications using ldapsdk might have random failures with Active Directory deployments if SASL is used. Should I close this ticket?
FYI, I just released version 4.0.14 and it includes this fix.
Thank you @dirmgr, I'm closing this ticket since the issue has been resolved.
Hello,
The issue reproduces with latest ldapsdk (4.0.11) when using LDAP connection with port 389 with SASL GSSAPI integrity enabled against Microsoft Active Directory domain, tested with Windows Server 2012 R2 and Windows Server 2016 latest.
The following exception is thrown on every search:
When looking in the traffic capture, you can see that there is SASL message ending and a new one starting in the same TCP reassembled packet (which could be related to the issue occurring).
I can provide any type of information for reproducing, just not sure what will be best to share.
Thanks, Roman