ldapjs / node-ldapjs

LDAP Client and Server API for node.js
http://ldapjs.org
MIT License
1.6k stars 424 forks source link

Support unicode DN #957

Open pyhedgehog opened 7 months ago

pyhedgehog commented 7 months ago

Actual result

> require('ldapjs').parseDN('cn=проба,ou=пера,dc=example,dc=com').toString()
'cn=\\d0\\bf\\d1\\80\\d0\\be\\d0\\b1\\d0\\b0,ou=\\d0\\bf\\d0\\b5\\d1\\80\\d0\\b0,dc=example,dc=com'

Expected result

> require('ldapjs').parseDN('cn=проба,ou=пера,dc=example,dc=com').toString()
'cn=проба,ou=пера,dc=example,dc=com'

In wild life

At least microsoft AD supports (and returns) utf8 DNs.

jsumners commented 7 months ago

The string is correct. The spec only allows ASCII characters. Any other characters must be escaped. See https://github.com/ldapjs/node-ldapjs/issues/860#issuecomment-1474479422.

pyhedgehog commented 7 months ago

Sorry for opening this issue without founding this comment... But then I've returned from ideal RFC world to real AD life. I've searched for some DN with non-ascii chars both in first and second RDN and done something like:

base1='cn=проба,ou=пера,dc=example,dc=com'
base2=String(ldapjs.parseDN(base1))
base3='ou=пера,dc=example,dc=com'
base4=String(ldapjs.parseDN(base3))
// check1
ldap.search(base1,{scope:'base'},cb)
// check2
ldap.search(base2,{scope:'base'},cb)
// check3
ldap.search(base3,{scope:'sub'},cb)
// check4
ldap.search(base4,{scope:'sub'},cb)

All searches got Uncaught LDAPError [NoSuchObjectError]: No Such Object error. Then I've returned to shell:

# ldapsearch $ldap_auth_args -b 'cn=проба,ou=пера,dc=example,dc=com' -s base -LLL '' dn
dn:: Y2490L/RgNC+0LHQsCxvdT3Qv9C10YDQsCxkYz1leGFtcGxlLGRjPWNvbQ==
# ldapsearch $ldap_auth_args -b 'cn=\d0\bf\d1\80\d0\be\d0\b1\d0\b0,ou=\d0\bf\d0\b5\d1\80\d0\b0,dc=example,dc=com' -s base -LLL '' dn
No such object (32)
Matched DN: dc=example,dc=com
Additional information: 0000208D: NameErr: DSID-0310028D, problem 2001 (NO_OBJECT), data 0, best match of:
        'dc=example,dc=com'

And same for OU/sub search — raw unicode argument works, but "escaped" version doesn't. In ldapjs both formats fails.

PS: Obviously I've replaced real DNs in examples, but sense is kept intact. PPS: How to get from returned "escaped" format of DN "raw" one?

jsumners commented 7 months ago

What version of ldapjs and @ldapjs/filter are you using?

pyhedgehog commented 7 months ago
$ npm ls --depth=2|grep ldap
@...
└─┬ ldapjs-promise@3.0.5
  ├─┬ @types/ldapjs@3.0.5
  └─┬ ldapjs@3.0.6
    ├── @ldapjs/asn1@2.0.0
    ├── @ldapjs/attribute@1.0.0
    ├── @ldapjs/change@1.0.0
    ├── @ldapjs/controls@2.1.0
    ├── @ldapjs/dn@1.1.0
    ├── @ldapjs/filter@2.1.1
    ├── @ldapjs/messages@1.3.0
    ├── @ldapjs/protocol@1.2.1
ncphins commented 6 months ago

This is the same error I have been seeing for quite some time. We have not moved to 3.x because of it. It does not reproduce with the openLDAP container. But it does with 2 production MSAD servers.

I am trying to get some logs from the servers. I have a theory, but I need logs to be able to support it.

ncphins commented 6 months ago

Here is what I am seeing. I cannot reproduce this with the openldap server. But let me see if I can explain what I am seeing to see if it helps. I really want to see if I can provide some assistance.

I have 2 production MS AD Servers. Using ldapsearch, both servers behave the same when searching for a DN. WORKS: ldapsearch -x -b "CN=Joe Glaßer,OU=users,OU=Germany,DC=FOO,DC=com" -H ldap://SERVER.COM:389 -D "CN=LDAP-Read,DC=FOO,DC=com" -W -s base "(objectclass=*)" dn cn DOES NOT WORK: ldapsearch -x -b "CN=Joe Gla\C3\9Fer,OU=users,OU=Germany,DC=FOO,DC=com" -H ldap://SERVER.COM:389 -D "CN=LDAP-Read,DC=FOO,DC=com" -W -s base "(objectclass=*)" dn cn

The second gives Object Not Found.

However, both of those commands work fine with the OpenLDAP server used for the unit tests. I have that running in a docker container and have the same user configured. I can search either with the escaped version and without the escaped version. One interesting thing is the logs for the container seem to show non-escaped for both but I am skeptical of the log formatting.

659da634 conn=1009 op=2 SRCH base="cn=Joe Glaßer,dc=adserver,dc=com" scope=0 deref=0 filter="(&(objectClass=*))"
659da634 conn=1009 op=2 SRCH attr=dn cn
659da634 conn=1009 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=

I did some tests on the DN itself.

This code will dump the parsed DN, then do thorough the DN popping the RDNs and dumping them using toString.

const parsedDN = parseDN('cn=Joe Glaßer,dc=adserver,dc=com');
console.log(parsedDN.toString());
let rdn = parsedDN.pop();
 while (rdn) {
     console.log(rdn.toString());
     rdn = parsedDN.pop();
 }

The output looks like:

cn=Joe Gla\c3\9fer,dc=adserver,dc=com
dc=com
dc=adserver
cn=Joe Gla\c3\9fer

However, if I add the {unescaped: true} option to the console.log(rdn.toString({unescaped: true})) then I get the "expected" output.

cn=Joe Gla\c3\9fer,dc=adserver,dc=com
dc=com
dc=adserver
cn=Joe Glaßer

When I look at the search code, the baseDN that is passed in is converted to a DN if a string is passed in (via the parseDN method) or just set if it is a DN object already.

My theory is that when the search is formatted to be sent to the server, a call is made on the DN using toString. And that results in the escaped value being sent to the server. That is fine for the test OpenLDAP server (as far as my testing goes) but that fails for both of my production MS AD servers as they do not like the escaped value. There is no {unescaped: true} option for the DN.toString() but even if here were, there would be no way to tell the search code to use it. But changing the DN.toString method to use the {unescaped: true} option when concatenating the RDNs makes it work great for both my servers and maintains compatibility with the OpenLDAP test container as well.

I wish I could generate a test that fails, but I cannot using the container.

jsumners commented 6 months ago

Thank you for the detailed diagnosis. It is appreciated. I am convinced this is the same problem under discussion in https://github.com/ldapjs/filter/pull/9#discussion_r1401459812. I need to find some time to re-learn a lot of this stuff and determine how to properly solve it. Basically, we need to replace \c3 with the byte 0xc3 but leave \\c3 alone when building the BER instance.

Foorack commented 5 months ago

Was just bitten by this as well. :slightly_smiling_face: For example: CN=Faxälv\, Max,OU=Users,DC=example,DC=com is impossible for ldapjs to fetch, as it encodes "ä" into byte sequence which Microsoft AD refuses to understand.

NOT RECOMMENDED/NOT A FIX: Replacing all contents of ./node_modules/@ldapjs/dn/lib/utils/escape-value.js to the following debug-contents to """properly""" (against spec, but Microsoft-compatible) handle Unicode; A.K.A escape those characters in the embeddedReservedChars list but leave everything else intact, temporarily solves the issue.

'use strict'
/// DO NOT USE IN PRODUCTION - NOT TESTED - PURELY FOR DEBUG
/// This file is for patching the ldapjs library at the following location:
/// @ldapjs/dn/lib/utils/escape-value.js
/// in order to """properly""" (against spec) handle Unicode. This is against RFC 4514,
/// but is required for Microsoft Active Directory to respond to queries with Unicode characters.

module.exports = function escapeValue(value) {
  if (typeof value !== 'string') {
    throw Error('value must be a string')
  }

  // 1. Split up input into individual UTF-8 characters.
  const toEscape = value.split('');
  // 2. Loop over all forbidden characters, and escape them.
  const embeddedReservedChars = [
      '"',
      '+',
      ',',
      ';',
      '<',
      '>'
  ];
  const escaped = [];
  for (let i = 0; i < toEscape.length; i++) {
    const char = toEscape[i];
    if (embeddedReservedChars.includes(char)) {
      escaped.push('\\' + char.charCodeAt(0).toString(16).padStart(2, '0'));
    } else {
      escaped.push(char);
    }
  }
  return escaped.join('');
}

Looking in Wireshark the query now looks like this: image and with this it successfully gives me a result. :partying_face:

Comparing this to the old/broken query: image

I do not know what the "fix" for this is; I just wanted to share my findings/experience.