Open mitar opened 2 years ago
That’s a difficult problem to solve indeed! On the one hand we want to have passwords not confusing one char with another that might be normalized together incorrectly.
I would say we should look at how other big systems solve this and follow best practices? Or maybe some guidance from Microsoft or Troy Hunt?
I posted links to both NIST and RFC recommendations. So we can probably follow either. :-)
I think it is in fact simple, it is just a question of always re-normalizing with the same normalization approach before hashing (instead of relaying on normalization done by the browser, which can vary).
On the one hand we want to have passwords not confusing one char with another that might be normalized together incorrectly.
This is just about byte-representation of same Unicode string when encoded into UTF8. There is no confusion here.
Ok, nice! Do you k now what the difference is between NFKC and NFKD?
If Unicode characters are accepted in memorized secrets, the verifier SHOULD apply the Normalization Process for Stabilized Strings defined in Section 12.1 of Unicode Standard Annex 15 [UAX 15] using either the NFKC or NFKD normalization. Subscribers choosing memorized secrets containing Unicode characters SHOULD be advised that some characters may be represented differently by some endpoints, which can affect their ability to authenticate successfully. This process is applied prior to hashing of the byte string representing the memorized secret.
Now this is the only confusing part. I have no idea why they went with NFKC or NFKD normalization. Those are used just for compatibility (if you want to read a bit more. RFC goes with NFC and it is probably the most reasonable. Although NFD has more "bytes" to it, so maybe for hashing it is also sane.
It really does not matter. Just go with one. :-) NFC is probably what most passwords are already normalized (because Chrome uses it), so going with NFC will break the least amount of passwords. Probably you can just start doing NFC normalization and those few users who used Unicode characters and not Chrome will have to reset their password.
Is it possible that this i ssue was opened in the wrong repository? Ory Hydra does not handle passwords except for Client Secrets which should be auto-generated and not chosen by users (we will change this in Ory Hydra 2.0)
You are right. This should go to Kratos.
There is this Go package: https://pkg.go.dev/golang.org/x/text/secure/precis
Preflight checklist
Describe the bug
Security doc cites NIST guidelines:
The tricky point is that same unicode password can be represented in UTF8 in different ways. So one should first employ UTF8 normalization and only then hash the password. And different browsers do use different normalization. :-(
This has been reported to NIST as well and they made this patch. It seems there is another standard for what one could do here. But I think the solution here is to use some normalization and it does not really matter which one. In Go, you can use this package.
Reproducing the bug
See description.
Relevant log output
No response
Relevant configuration
No response
Version
v0.8.0-alpha.3
On which operating system are you observing this issue?
Linux
In which environment are you deploying?
Docker Compose
Additional Context
See this blog post about Go normalization for more information.