w3c / webauthn

Web Authentication: An API for accessing Public Key Credentials
https://w3c.github.io/webauthn/
Other
1.16k stars 166 forks source link

UTF-8 decode should not be required for response.clientDataJSON and cData #2100

Open zacknewman opened 1 month ago

zacknewman commented 1 month ago

Currently the spec states:

Let JSONtext be the result of running UTF-8 decode on the value of response.clientDataJSON.

Note: Using any implementation of UTF-8 decode is acceptable as long as it yields the same result as that yielded by the UTF-8 decode algorithm. In particular, any leading byte order mark (BOM) MUST be stripped.

for step 5 in Registering a New Credential and

Let JSONtext be the result of running UTF-8 decode on the value of cData.

Note: Using any implementation of UTF-8 decode is acceptable as long as it yields the same result as that yielded by the UTF-8 decode algorithm. In particular, any leading byte order mark (BOM) MUST be stripped.

for step 8 in Verifying an Authentication Assertion.

This seems slightly too strict. While the notes call out stripping a BOM, they also state "yields the same result …" (emphasis added); however UTF-8 decode requires decoding with the "replacement" handler as well.

According to the serialization of the CollectedClientData, it is impossible for invalid UTF-8 to be generated. This means that RPs should only have to worry about stripping a BOM but not replacing invalid UTF-8 code units with the "replacement character" (i.e., U+FFFD); as the existence of invalid UTF-8 implies the serialization algorithm has not been adhered to as mandated by the spec. I suppose one could argue prepending the "zero width no-break space" character (i.e., U+FEFF) also violates the serialization algorithm; thus interpreting its existence as a byte-order mark (BOM) and subsequently stripping it seems bizarre too.

zacknewman commented 1 month ago

To add to this, the limited verification algorithm doesn't even rely on UTF-8 decode or any other decoding. This means the whole BOM stripping isn't even applicable to it. Either steps need to be added to both the serialization and limited verification algorithms or UTF-8 decode should not be required. If BOM stripping is necessary, then in order for the limited verification algorithm to be consistent, step 2 should be added to the serialization section:

  1. Let result be an empty byte string.
  2. Optionally append 0xefbbbf (i.e., U+FEFF) to result.
  3. Append 0x7b2274797065223a ({"type":) to result.

Similarly steps 3–5 should be added to the limited verification algorithm:

  1. Let expected be an empty byte string.
  2. If clientDataJSON is not at least 3 bytes long, fail.
  3. Let bom be the first three bytes of clientDataJSON.
  4. If bom is 0xefbbbf (i.e., U+FEFF), then append 0xefbbbf to expected.
  5. Append 0x7b2274797065223a ({"type":) to expected.
zacknewman commented 1 month ago

I hope this is not misconstrued as a "rant", but I think the real problem is the serialization algorithm itself. It is far too strict. The fact this algorithm is technically required means many clients or user agents will likely accidentally not conform to the spec since they likely use an actual JSON library which will likely not guarantee the order of fields, not guarantee unnecessary whitespace is not used, etc. The spec should be written such that clients and user agents have an easier time to conform to the spec and not a harder time so long as there is not a reduction in security.

The existence of the serialization algorithm seems to violate two big goals of WebAuthn: consistency among RPs in order to mitigate risks related to a fragmented ecosystem that causes users to turn away from WebAuthn altogether and the spec as a web standard.

Personally, I'm sympathetic to libraries that don't kowtow to non-conforming clients/usage; and as an RP writer, I find it hard to "forgive" these non-conforming clients. It is obvious that allowing any conforming JSON, even if it violates the spec, is not a "security issue"; but that creates a slippery slope. While Postel's law is still common, this can cause issues if one is not careful. The purpose of a spec should be to state what is necessary and not rely on implementors to use their best judgment when the spec can, and in this case seemingly should, be violated.

The existence of the serialization algorithm and the related limited verification algorithm is a not-so-subtle way to help non-web platforms—as confirmed by https://github.com/w3c/webauthn/issues/2102#issuecomment-2243256705. When questions come up about native applications and other non-web platforms, the usual (and appropriate) response is something to the effect of "WebAuthn is only a web standard, so concerns with native applications are not applicable"; yet here we have a required algorithm that is designed for non-web platforms.

Ideally the solution would be to state that clients/user agents can send any payload that conforms to RFC 8259 (or whatever formal standard you wish to cite), and they MAY optionally begin the payload with U+FEFF. Then at most a commentary note stating that they SHOULD conform to the serialization algorithm.

With above, an RP writer that relies on the limited verification algorithm cannot claim to be "WebAuthn-conforming" since conforming clients/user agents are allowed to send any JSON (and optional BOM).

Until then, the spec should at least be consistent and ensure the required serialization algorithm and the limited verification algorithm align with other areas of the standard that state BOM stripping MUST occur; because as of now, it's a bizarre requirement to allow payloads that strictly speaking should not be possible.

Related:

2101

2102