LDAP Search never completes due to SASL stream ASN.1 parsing error

romanblachman commented 5 years ago

Hello,

The issue reproduces with latest ldapsdk (4.0.11) when using LDAP connection with port 389 with SASL GSSAPI integrity enabled against Microsoft Active Directory domain, tested with Windows Server 2012 R2 and Windows Server 2016 latest.

The following exception is thrown on every search:

2019-07-14 19:51:18,095  WARN 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="WARNING" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.p
reempt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" caughtException="ASN1Exception(message='The ASN.1 stream reader has already read beyond the en
d of this set (expected set of length 79 to end at 40,875,654 bytes into the stream, but 40,875,691 bytes have already been read from the stream).', trace='hasMoreElements(ASN1StreamReaderSe
t.java:131) / readFrom(Attribute.java:1490) / readSearchEntryFrom(SearchResultEntry.java:278) / readSearchResultEntryFrom(InternalSDKHelper.java:361) / readLDAPResponseFrom(LDAPMessage.java:
1205) / run(LDAPConnectionReader.java:225)', ldapSDKVersion=4.0.11, revision=34e39aab27ea4fb92659a6888933db08099c7e41)"

com.unboundid.asn1.ASN1Exception: The ASN.1 stream reader has already read beyond the end of this set (expected set of length 79 to end at 40,875,654 bytes into the stream, but 40,875,691 bytes have already been read from the stream).
        at com.unboundid.asn1.ASN1StreamReaderSet.hasMoreElements(ASN1StreamReaderSet.java:131)
        at com.unboundid.ldap.sdk.Attribute.readFrom(Attribute.java:1490)
        at com.unboundid.ldap.sdk.SearchResultEntry.readSearchEntryFrom(SearchResultEntry.java:278)
        at com.unboundid.ldap.sdk.InternalSDKHelper.readSearchResultEntryFrom(InternalSDKHelper.java:361)
        at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1205)
        at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:225)

When looking in the traffic capture, you can see that there is SASL message ending and a new one starting in the same TCP reassembled packet (which could be related to the issue occurring).

2019-07-14 19:51:18,086  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='Sequence Header' berType='30'' valueLength=1629"
2019-07-14 19:51:18,086  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='Integer' berType='02'' valueLength=1 value='92'"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='Sequence Header' berType='84'' valueLength=0"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='String' berType='00'' valueLength=6 value='T^DZCN='"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='Sequence Header' berType='48'' valueLength=97"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='Sequence Header' berType='6c'' valueLength=101"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='String' berType='79'' valueLength=45 value='PC23637,OU=Manager,OU=Immi
gration_officer_642'"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='Set Header' berType='2c'' valueLength=79"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='byte[]' berType='55'' valueLength=61 value='4e696768746c794f72672c4443
3d707265656d70742c44433d6e696768746c793084000005f2308400000046040b6f626a656374436c61737331840000'"
2019-07-14 19:51:18,088  INFO 14586 [mpt.nightly:389] com.unboundid.ldap.sdk                   : level="INFO" threadID=120 threadName="Connection reader for connection 0 to nightly-dc01.pree
mpt.nightly:389" ldapSDKVersion="4.0.11" revision="34e39aab27ea4fb92659a6888933db08099c7e41" readASN1Element="dataType='byte[]' berType='00'' valueLength=51 value='0403746f700406706572736f6e
04146f7267616e697a6174696f6e616c506572736f6e0404757365720408636f6d7075746572'"

I can provide any type of information for reproducing, just not sure what will be best to share.

Thanks, Roman

dirmgr commented 5 years ago

The LDAP SDK doesn't really operate in terms of packets. It treats each connection as a continuous stream. As such, whether or not something is broken up across multiple packets is irrelevant. It could come all in one packet, in a separate packet per byte, or something in between, and it wouldn't really affect the way that the LDAP SDK consumes it.

In this case, the error message is because the LDAP SDK read the header for a BER set indicating that the combined length for all of the elements in the set was 79 bytes. It then read an element with 61 bytes of data (and presumably one byte for the BER type and one byte for the length, for a total of 63 bytes). That's less than 79 bytes, so there must be at least one more element in the set. Then, it read the next element, which had 51 bytes of data (and again, two more bytes for the type and length). So the LDAP SDK read 116 bytes of elements within the set when it only expected there to be 79 bytes. It can't recover from that, so it threw an exception and terminated the connection.

From looking at the data, it is clear that somehow the data that the client read got out of alignment with what the server was sending. The data is in the clear, but it does include GSSAPI integrity checking. We are about 40 megabytes into the stream, though, and there's no indication as to how long ago the bind was that created the integrity session. If it happened near the beginning of the connection, then it seems unusual that it would have gotten this long before running into problems. If it just happened, then it seems weird that it would have gotten that far into the connection before performing the bind that would have started the integrity session. In either case, it may be that we need more data to investigate the issue. I'll look into it on my side, but it may be a few days before I can dig into it in detail.

romanblachman commented 5 years ago

Thanks for the quick response!

If I understand this correctly, the ASN.1 is streamed on-top of SASL messages (due to integrity enabled) meaning that SASL messages are discrete in size and not streamed. This means that if the SASL buffer ends in the middle of the ASN.1 a new SASL header will appear out of nowhere with the new size for the next SASL message and then the rest of the ASN.1 data will appear:

| SASL header (length) |

| ASN.1 data |

| SASL header (length) | <------- Does not care if ASN.1 stream breaks

| ASN.1 data continued |

The SASL header may appear anywhere inside the ASN.1 message if the ASN.1 message is longer than the SASL size (which appears in the first header).

In the attached image, you can see ASN.1 message starting (30 84 0000065d 02 01 5c) and then out of nowhere SASL header appears (0000ffb3050405ff000c000c000000004b255e026f7a1a7053e22f8677333058) and in the new SASL message the ASN.1 stream continues (64 84 00000654...)

I have the PCAP attached with the issue reproduced: https://drive.google.com/file/d/1dcoi_ppHl9d2RZxUqNT3qVyLyxA7TDbO/view?usp=sharing

dirmgr commented 5 years ago

Sorry for taking a while to get back to you on this, but I've been unsuccessful in reproducing the problem. I'm not an expert with Active Directory by any means, and the only instance I have access to doesn't seem to be set up to handle GSSAPI authentication properly. I'm able to use the LDAP SDK to get a Kerberos ticket as the user that's trying to authenticate, but then AD is returning an error in response to the GSSAPI bind attempt with an indication that it's not finding a principal in the KDC. I get this error regardless of the quality of protection, so it's not related to an attempt to use integrity checking.

Rather than continuing to bash my head against AD, I decided to switch to a different non-Ping/UnboundID server, so I chose OpenLDAP. After some effort, I was able to get an OpenLDAP instance up and running and accepting GSSAPI authentication using SASL integrity. I've had a test running successfully with hundreds of millions of requests over connections protected with SASL integrity and have not encountered any errors.

You are correct in that the SASL integrity processing is injected between the TCP transport and the LDAP communication. Basically, whenever either side wants to send data to the other side, it uses the previously negotiated settings to encode the data in some way (for integrity checking, this is basically putting a signature before the clear-text data; for confidentiality, it's encrypting the entire set of data). Once it has that encoded data, it first writes four bytes with the two's complement encoding of the number of bytes in that encoded blob (for example, in the packet capture that you provided, the 16th packet is the first one that has SASL integrity protection, and the "00 00 00 bf" sequence in that packet says that there are 191 bytes of SASL data), and then it follows that with the encoded data. On the other end of the connection, it first reads those four bytes to figure out how many bytes of encoded data there are, then it reads that many bytes of encoded data, and then it uses the negotiated settings to unwrap it and end up with the clear-text data.

In the LDAP SDK, this is handled in two places:

For writing sending requests to the server, it happens in LDAPConnectionInternals.sendMessage. We have the LDAP message to write, and if we don't have a SaslClient instance, then we'll just write the bytes of that message to the socket as-is. But if we do have a SaslClient, then we first use its wrap method to encode the data to be written. We then write a four-byte representation of the number of bytes of that encoded data, and finally the encoded request itself.
For reading responses from the server, it happens in ASN1StreamReader.readAndDecodeSASLData. We read four bytes of data to figure out how long the encoded blob is, then we read that many bytes of data, and then we use the SaslClient.unwrap method to get the clear-text LDAP data. Elsewhere in the class, we do differentiate between whether we're reading data directly from a socket or from a buffer holding SASL-decoded data, but that's primarily done in the read methods, and I've been through that code without seeing an obvious problem.

Ultimately, the LDAP SDK isn't really doing packet-based reading. It's using blocking sockets to read the amount of data that it needs, and it doesn't care whether it's split into multiple packets. It also doesn't care whether a single LDAP message (or an ASN.1 element within a message) is split across multiple SASL wrapped messages. I've done some additional testing with really big entries (both an entry with a ton of attributes, and an entry with an attribute with a huge value) and verified that the SDK can handle big messages split into lots of SASL integrity chunks.

At this point, it's tough to say what the problem might be. The packet capture you provided is helpful, but it's hard to use in any meaningful way without the keys. I may be able to do a bit of analysis from the capture, but it's hard to say.

The easiest solution at this time would probably be to just switch to using authentication only from GSSAPI, and rely on TLS if you want integrity or confidentiality.

dirmgr commented 5 years ago

I did a little more testing around this by creating a very simple proxy that breaks up packets into small chunks, and I've run it for a long duration with both a fixed size of one byte (in which each byte of the request and response is sent in its own packet), and with random packet sizes between 1 and 5 bytes. Note that this chunking was performed after SASL integrity processing had been performed, so that means that the SASL integrity data and also the wapped data length bytes were split up across multiple packets. I didn't encounter any issues at all over the course of this testing, so I'm really not seeing any evidence that there's a problem in the LDAP SDK's handling of communication protected with SASL integrity or confidentiality.

romanblachman commented 5 years ago

I can try to create a standalone code that reproduces the issue against our LDAP server and then share the server IP and credentials with you (requires Kerberos ticket so might be a bit tricky). We can also jump on a call where I can demo it and see how I can help reproduce the issue for further investigation.

Meanwhile, I did notice something interesting in the traffic captures. When running exactly the same LDAP query with the ldapsearch utility (that had no errors) I’ve noticed that the SASL message size that returns from the LDAP server is different between than the one in the ldapsdk.

The ldapsdk SASL length is always about 65K while the ones that show in the ldapsearch traffic capture are much higher since it’s a DWORD and it’s not bound to the TCP max packet size. Do you know if the SASL length is negotiated or configured somewhere?

Again, thank you very much for the investigation. I will do my best to help figure out why GSSAPI integrity with MS Active Directory fails. Switching to LDAPS is an option, but I assume this issue is still relevant since if MS LDAP Signing is enforced GSSAPI over LDAPS will still be required in secure enterprise environments.

romanblachman commented 5 years ago

Hello again, we are still struggling with the problem that keeps reproduces consistently on one of our development domains. We can expose the relevant DC LDAP to the internet, will it help you with finding a possible solution?

It looks like in January 2020 Microsoft will enforce LDAP signing by default (https://support.microsoft.com/en-us/help/4520412/2020-ldap-channel-binding-and-ldap-signing-requirement-for-windows) and we want to provide our customers the option to use LDAP Signing with Active Directory using the ldapsdk.

If you wish to talk further offline please reach out @ rblachman@preempt.com

Thank you!

dirmgr commented 5 years ago

Unfortunately, I'm not sure that I'll be able to make any additional progress on this. I don't have access to an Active Directory instance that I can use for this testing, and all of my testing with other types of directory servers has not identified any problems.

borisdan commented 4 years ago

Hi @dirmgr, we can provide you with access to an Active Directory server where the problem reproduces persistently. Would that help?

dirmgr commented 4 years ago

I’m sorry to say that at present, we do not have the time or resources to devote to investigating this issue. I have created an internal issue to track it, and may be able to give it some attention in the future.

I did spend a considerable amount of time testing with OpenLDAP under various conditions, and I was not able to reproduce the problem that you describe. Further, the LDAP SDK does not implement its own support for GSSAPI, but instead relies on the GSSAPI support in the underlying JVM. It is possible that the issue may lie there, or that there may be a problem in the way that Active Directory has implemented its support. Have you tried creating a test program with another Java-based API (especially JNDI, as it would definitely rely on the same underlying JVM support) to determine whether the problem is evident there?

I’m sorry that we’re not able to provide more assistance on this at the moment, but I’m simply not in a position to devote the time and effort that may be required to investigate it.

borisdan commented 4 years ago

Thank you very much @dirmgr. We will investigate this further and update here with our findings. Your help is highly appreciated.

borisdan commented 4 years ago

@dirmgr I believe to have found the bug.

The problem happens when ASN1StreamReader.peek() is called at the end of a SASL buffer. Please see the problematic code - I have added inline comments to explain.

public int peek()
         throws IOException
  {
    final InputStream is;
    if (saslClient == null)
    {
      is = inputStream;
    }
    else
    {
      if ((saslInputStream == null))  //  || (saslInputStream.available() == 0)    is the fix I used in my lab
      {
        readAndDecodeSASLData(-1);
      }

      is = saslInputStream;   // saslInputStream is loaded to is
    }

    is.mark(1); // Point in buffer is marked
    final int byteRead = read(true); // Read is called just before new SASL frame. New buffer put in saslInputStream
    is.reset(); // reset to marked point is called on the old buffer. The new buffer remains one byte into its reading, causing the unwanted shift

    return byteRead;
  }

Please confirm whether I am correct.

dirmgr commented 4 years ago

This definitely looks promising. I'll need to look into it more carefully, and I probably won't be able to do that until next week, but I think that you're probably right.

dirmgr commented 4 years ago

After taking a more thorough look at it, I do think that you've identified the problem. I went ahead and committed a change that is very similar to the one that you proposed. Let me know whether you can confirm that this indeed fixes the problem.

Thanks for your help with this!

borisdan commented 4 years ago

Thanks @dirmgr! This works well.

When can we expect this fix to make it into a release? We would rather keep on getting the package from Gradle.

dirmgr commented 4 years ago

Unfortunately, it just missed last week's 4.0.13 release. I'm not sure when the next one will be, but it'll probably be at least a couple of months.

romanblachman commented 4 years ago

@borisdan awesome job finding the issue!

@dirmgr Thank you for merging to master! Expect that when Microsoft switches to LDAP Signing enforcement by default starting from Jan. 2020, applications using ldapsdk might have random failures with Active Directory deployments if SASL is used. Should I close this ticket?

dirmgr commented 4 years ago

FYI, I just released version 4.0.14 and it includes this fix.

romanblachman commented 4 years ago

Thank you @dirmgr, I'm closing this ticket since the issue has been resolved.

pingidentity / ldapsdk