UTF8-normalize passwords before hashing

mitar commented 2 years ago

Preflight checklist

[X] I could not find a solution in the existing issues, docs, nor discussions.
[X] I agree to follow this project's Code of Conduct.
[X] I have read and am following this repository's Contribution Guidelines.
[ ] This issue affects my Ory Cloud project.
[X] I have joined the Ory Community Slack.
[ ] I am signed up to the Ory Security Patch Newsletter.

Describe the bug

Security doc cites NIST guidelines:

All printing ASCII [RFC 20] characters as well as the space character SHOULD be acceptable in memorized secrets. Unicode [ISO/ISC 10646] characters SHOULD be accepted as well.

The tricky point is that same unicode password can be represented in UTF8 in different ways. So one should first employ UTF8 normalization and only then hash the password. And different browsers do use different normalization. :-(

This has been reported to NIST as well and they made this patch. It seems there is another standard for what one could do here. But I think the solution here is to use some normalization and it does not really matter which one. In Go, you can use this package.

Reproducing the bug

See description.

Relevant log output

No response

Relevant configuration

No response

Version

v0.8.0-alpha.3

On which operating system are you observing this issue?

Linux

In which environment are you deploying?

Docker Compose

Additional Context

See this blog post about Go normalization for more information.

aeneasr commented 2 years ago

That’s a difficult problem to solve indeed! On the one hand we want to have passwords not confusing one char with another that might be normalized together incorrectly.

I would say we should look at how other big systems solve this and follow best practices? Or maybe some guidance from Microsoft or Troy Hunt?

mitar commented 2 years ago

I posted links to both NIST and RFC recommendations. So we can probably follow either. :-)

I think it is in fact simple, it is just a question of always re-normalizing with the same normalization approach before hashing (instead of relaying on normalization done by the browser, which can vary).

On the one hand we want to have passwords not confusing one char with another that might be normalized together incorrectly.

This is just about byte-representation of same Unicode string when encoded into UTF8. There is no confusion here.

aeneasr commented 2 years ago

Ok, nice! Do you k now what the difference is between NFKC and NFKD?

If Unicode characters are accepted in memorized secrets, the verifier SHOULD apply the Normalization Process for Stabilized Strings defined in Section 12.1 of Unicode Standard Annex 15 [UAX 15] using either the NFKC or NFKD normalization. Subscribers choosing memorized secrets containing Unicode characters SHOULD be advised that some characters may be represented differently by some endpoints, which can affect their ability to authenticate successfully. This process is applied prior to hashing of the byte string representing the memorized secret.

mitar commented 2 years ago

Now this is the only confusing part. I have no idea why they went with NFKC or NFKD normalization. Those are used just for compatibility (if you want to read a bit more. RFC goes with NFC and it is probably the most reasonable. Although NFD has more "bytes" to it, so maybe for hashing it is also sane.

It really does not matter. Just go with one. :-) NFC is probably what most passwords are already normalized (because Chrome uses it), so going with NFC will break the least amount of passwords. Probably you can just start doing NFC normalization and those few users who used Unicode characters and not Chrome will have to reset their password.

aeneasr commented 2 years ago

Is it possible that this i ssue was opened in the wrong repository? Ory Hydra does not handle passwords except for Client Secrets which should be auto-generated and not chosen by users (we will change this in Ory Hydra 2.0)

mitar commented 2 years ago

You are right. This should go to Kratos.

mitar commented 10 months ago

Looking at this again, to me rfc8265 seems reasonable to apply here.

mitar commented 10 months ago

There is this Go package: https://pkg.go.dev/golang.org/x/text/secure/precis

ory / kratos