Closed jborean93 closed 4 years ago
I think this is becoe UCS2-LE simply cannot represent surrogate pairs, i guess we'll have to move to UTF-16, which seem what Microsoft has moved to in their OSs, to handle Unicode characters once the set grew past what UCS2 could handle.
The difficulty in this is to change the code in various places, as now utf8 string generated can be much bigger as the utf8 representation can be longer than the string *2 ...
In fact I think (hope) the worst case is len(utf8 string) = 3* len(utf16 string)
actually an utf8 string is never more than twice a utf16 string in size when counting in bytes. when counting the number of code points utf8 sometimes uses 3 codepoints to represent what utf16 can represent with a single code point, but surrogate pairs in ut16 are always represented with max 4 bytes in utf8 as well.
When trying to get a credential for a password that contains a UTF-16 surrogate pair char like
π
it fails withThis seems to be a problem with the
libunistring
library when trying to convert UTF-8 bytes to theUCS-2LE
encoding. I have no idea if this can be fixed or whether we should really care but technically I can create a username and password in Windows with these characters in the value and authenticate with them using NTLM.To replicate this problem run
It seems like this step and the same for
NTOWFv1
is where the problem occurs but I don't fully understand howlibunistring
really works to see if there is a workaround or whether this should be raised there.Even if we get past this and don't use these types of chars for the password, the code also fails when generating or parsing an authenticate message with a username like this. I haven't looked into the code to see what this may be but I would guess it's a similar situation with the password.
If you wish to try and fix this I'm happy to supply a way to set up a local user with a char that becomes a surrogate pair on Windows as I've tested this out with a Python NTLM implementation I have.