w3c / webauthn

Web Authentication: An API for accessing Public Key Credentials
https://w3c.github.io/webauthn/
Other
1.19k stars 178 forks source link

Remove `RegistrationResponseJSON.id` and `AuthenticationResponseJSON.id` #2119

Closed zacknewman closed 3 months ago

zacknewman commented 3 months ago

First, RegistrationResponseJSON.id and AuthenticationResponseJSON.id are modeled as Base64URLStrings, but Credential.id is already a USVString so it's a little bit weird to change the type to a Base64URLString since from my understanding only ArrayBuffers should need to have their types changed (emphasis added):

toJSON()

This operation returns RegistrationResponseJSON or AuthenticationResponseJSON, which are JSON type representations mirroring PublicKeyCredential, suitable for submission to a Relying Party server as an application/json payload. The client is in charge of serializing values to JSON types as usual, but MUST take additional steps to first encode any ArrayBuffer values to DOMString values using base64url encoding.

Are user agents expected to encode id again so that an RP receives a Credential ID that has been encoded twice or are user agents expected to know that the USVString is already a Base64URLString and send it as is? If the latter, it seems less confusing to just leave the type as a USVString.

Anyway, one issue I have as an RP library writer is deciding which Credential ID to use. Since my library is focused on strict conformance, I require all instances of Credential IDs to match instead of arbitrarily selecting one to use when they don't align. The more places Credential IDs exist the more likely there is disagreement. What is the motivation behind providing both id and rawId? They will be exactly the same, so why not remove one? I know at least one application that provides different values for id and rawId almost certainly by accident: one is double encoded as base64URL.

emlun commented 3 months ago

Looks like rawId first appeared in commit https://github.com/w3c/webauthn/pull/384/commits/ab8d74e50493fbe9e49ebb439a2d29337d63cdf1 of PR #384. I don't see any rationale for it in that thread, but this same discussion was brought up in issue #412 which was closed without action as the proposed changes were breaking changes.

Are user agents expected to encode id again so that an RP receives a Credential ID that has been encoded twice or are user agents expected to know that the USVString is already a Base64URLString and send it as is? If the latter, it seems less confusing to just leave the type as a USVString.

User agents are expected to not re-encode id, so yes, id and rawId are essentially aliases in the *ResponseJSON context. Fair point about the Base64URLString type perhaps making it seem like it would be doubly-encoded, but on the other hand it is a base64 encoded value. @MasterKale any thoughts on that?

one issue I have as an RP library writer is deciding which Credential ID to use.

I agree that libraries should not tolerate a mismatch between id and rawId. The server-side library I've worked on, java-webauthn-server, essentially treats the two as aliases, requires at least one of them to be present and rejects mismatches if both are present. That library predates these *ResponseJSON types, but the idea is the same.

MasterKale commented 3 months ago

Are user agents expected to encode id again so that an RP receives a Credential ID that has been encoded twice or are user agents expected to know that the USVString is already a Base64URLString and send it as is? If the latter, it seems less confusing to just leave the type as a USVString.

User agents are expected to not re-encode id, so yes, id and rawId are essentially aliases in the *ResponseJSON context. Fair point about the Base64URLString type perhaps making it seem like it would be doubly-encoded, but on the other hand it is a base64 encoded value. @MasterKale any thoughts on that?

Sorry for creating confusion with this. User agents are not expected to encode id again. Thinking back I probably used Base64URLString for id as well because "it's a base64url-encoded string." I agree it would make sense for it to share the type of Credential.id, though.

@emlun can I simply change RegistrationResponseJSON.id and AuthenticationResponseJSON.id to USVString without breaking anything?

zacknewman commented 3 months ago

User agents are expected to not re-encode id, so yes, id and rawId are essentially aliases in the *ResponseJSON context. Fair point about the Base64URLString type perhaps making it seem like it would be doubly-encoded, but on the other hand it is a base64 encoded value. @MasterKale any thoughts on that?

I don't find that argument convincing considering there are other fields that are something more specific but nonetheless retain the same more general type. In particular clientDataJSON is a USVString which is a DOMString; yet it's still modeled as an ArrayBuffer and USVStrings are often modeled as DOMStrings.

emlun commented 3 months ago

@MasterKale It could in theory break existing implementations of the WebIDL, but the feature hasn't been in a mature spec release yet, so I say go for it. Usages by RPs are unlikely to break since the actual types are the same.

We should prefer DOMString over USVString wherever we can, though, as recommended by the Web IDL spec:

Specifications should only use USVString for APIs that perform text processing and need a string of scalar values to operate on. Most APIs that use strings should instead be using DOMString, which does not make any interpretations of the code units in the string. When in doubt, use DOMString.

zacknewman commented 3 months ago

We should prefer DOMString over USVString wherever we can, though, as recommended by the Web IDL spec:

That is why I created the now-closed issue talking about Credential.id being a USVString. If we change it to a DOMString, does it warrant a comment on why the type changes? Luckily USVStrings are DOMStrings (I think), so it should be fine.

zacknewman commented 3 months ago

Looks like rawId first appeared in commit ab8d74e of PR #384. I don't see any rationale for it in that thread, but this same discussion was brought up in issue #412 which was closed without action as the proposed changes were breaking changes.

I'm confused about what a breaking change is. Changes are constantly happening to the WebAuthn Level 3 draft including some that are breaking. I assumed since it's a draft breaking changes are tolerated when reasonable. I think removing id or rawId would constitute as a "reasonable" breaking change since it reduces one more landmine that can exist: disagreement between the (should-be) identical id and rawId fields.

To be clear, I'm not advocating for id or rawId to be removed from PublicKeyCredential but only the JSON-motivated definitions which don't exist in any of the previous published specs. I assumed the reason for id and rawId to exist is to allow for simpler treatment of Credential IDs on the client side: id when you want a base64URL encoding of the Credential ID—likely to send to the server which is largely not needed anymore with the JSON-motivated definitions—and rawId when you want the actual Credential ID.

Essentially the JSON-motivated definitions provide a standardized solution of how clients should send data to the server which happens to include the likely motivation of id to begin with. Since it provides the more robust solution, I don't see why to retain any now-redundant fields even if PublicKeyCredential has them.

emlun commented 3 months ago

That is why I created the now-closed issue talking about Credential.id being a USVString. If we change it to a DOMString, [...]

"We" the WebAuthn working group cannot change it, because it's defined in a different spec. You're welcome to bring the issue to the Web Application Security Working Group as noted in #2118, but it's off topic for this working group.

I'm confused about what a breaking change is.

I haven't seen a hard definition, but broadly it's about whether existing implementations of the three participants: authenticators, clients and RPs, would remain compatible if one, but not all, participants make the change in question. For example, removing rawId would make L3 client implementations incompatible with existing RP implementations that use it. Indeed this is mostly concerned with compatibility between mature ("Candidate Recommendation" or higher) specifications, and breaking changes within "Working Draft" state are more allowable. But in reality, implementations sometimes launch before the spec is mature, so in practice it's a matter of consensus in the Working Group on how much would break and whether that's a problem.

I don't know how the decision was made in #412, as that was long before the L1 release, so there wasn't a mature spec to break. I would assume there may have been objections against breaking existing prototype implementations and that the benefit wasn't considered worth it.

I'm not advocating for id or rawId to be removed from PublicKeyCredential but only the JSON-motivated definitions which don't exist in any of the previous published specs.

Fair point. Though perhaps it's debatable which is worse under the principle of least surprise: *ResponseJSON duplicating id as rawId, or *ResponseJSON missing the rawId attribute from the *Response types they're supposed to be mirroring?

I think I agree that correctness should override that concern, though: removing *ResponseJSON.rawId is better because it eliminates a possible source of inconsistency.

@MasterKale What do you think?

zacknewman commented 3 months ago

"We" the WebAuthn working group cannot change it, because it's defined in a different spec. You're welcome to bring the issue to the Web Application Security Working Group as noted in https://github.com/w3c/webauthn/issues/2118, but it's off topic for this working group.

I should have been clearer: I was not against the closure of the issue but merely stating that was the motivation behind the issue and where there will be a slight disagreement if we model id as a DOMString. I used the modifier "now-closed" for informative reasons that would allow one to know where to search for this issue. I apologize for it coming off as being disgruntled.

Fair point. Though perhaps it's debatable which is worse under the principle of least surprise: *ResponseJSON duplicating id as rawId, or *ResponseJSON missing the rawId attribute from the *Response types they're supposed to be mirroring?

Definitely debatable. I think the primary reason for the JSON-motivated definitions is to supply all the necessary data a server needs and no redundant info. The latter for reasons of reducing areas where data can disagree—I realize authenticatorData, publicKey, etc. are redundant, but they make the job easier on the server while having both id and rawId do not. The secondary goal is to ensure the JSON-motivated definitions align closely to the IDL definitions in data and name (e.g., clientExtensionResults instead of extensions which I've seen more often "in the wild" from L2 clients). When there is a conflict between the two, the primary goal should win; but as you said, this is debatable.

MasterKale commented 3 months ago

Fair point. Though perhaps it's debatable which is worse under the principle of least surprise: *ResponseJSON duplicating id as rawId, or *ResponseJSON missing the rawId attribute from the *Response types they're supposed to be mirroring?

I think I agree that correctness should override that concern, though: removing *ResponseJSON.rawId is better because it eliminates a possible source of inconsistency.

@MasterKale What do you think?

It has always been weird to me that id and rawId are always the same value. FIDO conformance requires RPs (and libraries that aim to become conformant) to check that both are present and equal to each other, but beyond that there is no practical reason to keep both values around.

However, both values exist in PublicKeyCredential and afaik we're not considering trying to remove the redundancy here. I suggest it would become confusing for a JSON-serialized representation of a PublicKeyCredential to omit any value because there's nothing about the serialization process that suggests, "this value will be serialized and also optimized of redundancies."

The goal of the serialization methods was to reduce the complexity of transforming a response to .create() or .get() to help get it from the front end to the back end. It was never intended to address any of the issues higher up the type hierarchy about redundant values. I don't think it would help the adoption of the JSON serialization methods either if they became perceived as somehow "flawed" because people looked at the output and think, "this is missing rawId...what else might it not be serializing?"

zacknewman commented 3 months ago

It has always been weird to me that id and rawId are always the same value.

Technically they are not the same: one is base64url encoded and the other is not. For this reason I find it less weird that they both exist. When a client wants the actual Credential ID, it's nice rawId exists; otherwise you'd have to decode id. Similarly when a client wants a base64URL-encoded Credential ID (perhaps to send to a server), it's nice id exists; otherwise you'd have to encode rawId. Personally, I still wouldn't have included both; but that ship has sailed. I should add that id is inherited from Credential which was already typed as a USVString. This means PublicKeyCredential was essentially "forced" to define id as a base64URL-encoding of the Credential ID; when in reality, rawId "fits" better: methods like getPublicKey return an ArrayBuffer not a base64URL-encoding of the SubjectPublicKeyInfo. Consequently adding rawId could also be viewed as a "compromise" where you are adding a more appropriate field but are also still defining PublicKeyCredential as a Credential.

In the JSON-motivated definitions though, they are actually the same (i.e., bit-for-bit identical). This means the same minor benefit does not exist.

The goal of the serialization methods was to reduce the complexity of transforming a response to .create() or .get() to help get it from the front end to the back end.

Including both increases the complexity though therefore contradicting the goal: now RPs have to verify both fields exist and are identical. Hopefully once Level 3 is published and toJSON is exposed, fewer clients will send data that mismatch; so perhaps this issue won't matter.

Regardless, this is not a hill I'm willing to die on; but I do feel more strongly that id should be changed to DOMString/USVString.