Filter contains =I{K when searching for a GUID

idsecurity commented 5 years ago

Hello,

I'm trying to create a filter from a string representation of a GUID for searching in AD/eDirectory.

For example if I have a GUID such as c8381f3d497b4bcca91564eadaee8b08 I can create a filter manually that looks like (objectGUID=\c8\38\1f\3d\49\7b\4b\cc\a9\15\64\ea\da\ee\8b\08).

If I then take that filter string and do a Filter.create, and then a toString() it looks fine.

But if I want to create an OR filter from several such filters and I add them to a List<Filter> first and then do Filter createORFilter = Filter.createORFilter(list); the filter is changed to contain the string =I{K.

I'm not sure sure of why, I can reproduce it with this code below:


import com.unboundid.ldap.sdk.Filter;
import java.util.ArrayList;
import java.util.List;

public class GUIDFilter {

    static String guid = "c8381f3d497b4bcca91564eadaee8b08";

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        try {
            Filter f = Filter.create("(objectGUID=" + GUIDtoSearchableString(guid) + ")");
            System.out.println(f.toString());//Prints (objectGUID=\c8\38\1f\3d\49\7b\4b\cc\a9\15\64\ea\da\ee\8b\08)
            List<Filter> list = new ArrayList<>();
            list.add(f);
            Filter createORFilter = Filter.createORFilter(list);
            System.out.println(createORFilter.toString());//Prints (&(objectGUID=\c8\38\1f=I{K\cc\a9\15d\ea\da\ee\8b\08))

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static String GUIDtoSearchableString(String guid) {

        assert guid.length() == 32;
        StringBuilder sb = new StringBuilder(64);
        sb.append("\\");
        char[] toCharArray = guid.toCharArray();
        for (int i=0; i<toCharArray.length; i++) {
            sb.append(toCharArray[i]);
            if (i % 2 == 1 && i != 31) {
                sb.append("\\");
            }
        }
        return sb.toString();
    }

}

dirmgr commented 5 years ago

Thanks for reporting this. Fortunately, though, this is actually not a problem. The value that the LDAP SDK is generating is logically equivalent to the one that you originally provided, so the server should treat it in exactly the same way.

The reason has to do with the way that escaping works in search filters, and how the LDAP SDK treats that escaping. RFC 4515 provides the specification for generating the string representation of search filters, and it states that you can escape any byte by prefixing the hex representation of that byte with a backslash. Most of the bytes that you have in your value aren’t for printable characters, so the LDAP SDK keeps them encoded. But some of them do represent printable characters. In particular:

\49 is the way to escape the capital letter I
\3d (or \3D) is the way to escape the equal sign (=)
\4b (or \4B) is the way to escape the capital letter K
\64 is the way to escape the lowercase letter d

One thing that is less obvious, though, is why the “\38” (the second byte of the value) isn’t getting converted into the ASCII letter 8, because “\38” is the way to escape the number 8. The reason for this is that the byte that precedes it actually indicates that it’s part of a two-byte UTF-8 character. Even though the UTF-8 character set doesn’t actually have a character assigned to the byte sequence “\c8\38”, that sequence is within the Latin Extended-B range (between \c7\bf, which defines the “Latin small letter o with stroke and acute” character “ǿ”, and \c8\80, which defines the “Latin capital letter A with double grave” character “Ȁ”), and it’s conceivable that \c8\38 could actually be assigned a character at some point in the future.

Another thing that is less obvious about this is why the string representation changes based on the two ways that you’re creating the filter. When you use Filter.create to construct a Filter object from its string representation, then the LDAP SDK remembers that string representation and uses it as the value that gets returned when you call the filter’s toString method. But when you use other methods for constructing a filter object, like Filter.createORFilter, the LDAP SDK constructs the string representation itself and in the course of doing that, it tries to represent printable ASCII characters that don’t have to be escaped in a filter using their printable ASCII representations rather than their escaped versions. This is generally a good thing because it ensures that most string values with non-ASCII characters remain as readable as possible and only special characters get escaped, but in a corner case like this one, you can end up with unexpected (but still correct) results.

It’s also important to note that the LDAP protocol does not transfer filters as strings. Instead, they use an ASN.1 BER encoding that transfers filter assertion values in binary form. That means that when you actually send a search request to the directory server, whether you have the filter “(|(objectGUID=\c8\38\1f\3d\49\7b\4b\cc\a9\15\64\ea\da\ee\8b\08))” or the filter “(|(objectGUID=\c8\38\1f=I{K\cc\a9\15d\ea\da\ee\8b\08))”, exactly the same bytes get transmitted. So this behavior won’t have any impact at all on the way the server processes the filter because it has no way of knowing how that filter was constructed on the client; it only sees the encoded result.

idsecurity commented 5 years ago

Thank you for taking the time to answer my question in such great detail. I've learned a couple of new things today which is always nice. I'll close the issue then since it's not a problem.

pingidentity / ldapsdk

Filter contains =I{K when searching for a GUID #52