Proposing alternate form to unify disclosure arrays

rohan-wire commented 9 months ago

Hi, Currently an array which holds the salt and an unblinded claim has different forms depending on whether is contains array or non-array values. Nested disclosures require potentially different combinations of disclosures.

Instead, what if there was only exactly one location for the _sd claim array (usually at the root of the document), and the disclosure array always had the following always consistent form:

salt
claim path from the location of the parent of the _sd claim of the JWT
value

For example: ["r0mhd7LctwDCTXF_YfvV0g", ["family_name"], "Möbius"] ["uKp1B0jI2Ezmuo8EaNRG2Q", ["nationalities","..."], "FR"] ["CTCthBCyHMuDlXF9qtY6TA", ["address","postcode"], "EC1"]

Where family_name is a peer to the _sd claim array portion of the JWT, nationalities is to _sd, and "..." indicates this is one value in the array, and postcode is an element inside address, which is a peer to _sd.

danielfett commented 9 months ago

Thanks for your suggestion, Rohan.

We thought a lot about pointer-based approaches like the one you propose in the beginning, but there are some drawbacks:

The Verifier can re-assemble the document without checking the hashes. This is dangerous (see the second paragraph of https://drafts.oauth.net/oauth-selective-disclosure-jwt/draft-ietf-oauth-selective-disclosure-jwt.html#name-manipulation-of-disclosures).
The current design in the draft allows to send smaller presentations if only a few claims are disclosed and recursive disclosures are used.
There is no need to discuss corner cases, for example, where a child element is disclosed, but not the parent element. From my experience, the design the we currently have is slightly easier to implement (fewer special cases).

In your proposal, how would you encode the order of elements in an array? Just by the order of digests in _sd?

How would you disambiguate the (at least two) potential meanings of encountering the two disclosures ["CTCthBCyHMuDlXF9qtY6TA", ["addresses","...","postcode"], "EC1"] and ["uKp1B0jI2Ezmuo8EaNRG2Q",["addresses","...","street_address"], "EC1"]?

rohan-wire commented 9 months ago

Hi Daniel,

Thanks for your quick response.

In all the examples I have seen using identity, I didn't see much of a compelling need for either ordered arrays or selective disclosure of sub-fields inside an array value. However, if the order inside an array was important you could use an array index after "..." for example:

`["CTCthBCyHMuDlXF9qtY6TA", ["addresses","...[0]","postcode"], "EC1"] which would be the postcode element in the 1st array value under addresses "addresses": [ {"postcode": "EC1"}, {}, {}]

I suspect the "order is not important" case is probably more common, so you could still just use "..." to mean somewhere in the array.

Regarding your first three bullet points:

I don't think "the verifier can [construct the claims] without checking hashes" is very persuasive. If an implementer is going to completely disregard security in a security spec, they would be just as likely to have a finite list of expected claims and if they see one of these tokens assume they know where it goes under the current spec (still ignoring the hashes).
I also disagree that there are fewer corner cases. (I'll open another issue for the ones that I see for the current design). I think there are several corner cases in both designs and I suspect there are about the same number of them.
Regarding smaller presentations, I am curious here. If we are optimizing for specific use cases it would be good to understand what those are. Could you share a practical example or two?

The other advantage I see is you could have fewer decoy digests if all the values are under a single _sd.

danielfett commented 9 months ago

Hi Rohan,

I think the order of elements is important generally and especially in the case shown in my previous comment - omitting it can quickly lead to ambiguous encodings.

Regarding the hash check: The current design avoids a major footgun! Of course, anybody who writes an SD-JWT implementation for just one use case can skip the hash check. However, it is very hard (or impossible?) to write a generic library supporting the current design while skipping the hash check. This is the beauty of the approach. With the pointer-based approach, nobody needs to check any hash. (I know that is easy to say "but this is a security spec, everybody should be cautious and whoever doesn't is just careless!" but that is not how the world works. Mistakes that can be made will be made; it is better to make making mistakes hard.)

Regarding the size of presentations, an example can be found in the specification in Section 7.3: If address is not to be disclosed, only this data needs to be sent:

{
  "_sd": [
    "HvrKX6fPV0v9K_yCVFBiLFHsMaxcD_114Em6VT8x1lg"
  ],
  "iss": "https://issuer.example.com",
  "iat": 1683000000,
  "exp": 1883000000,
  "sub": "6c5c0a49-b589-431d-bae7-219122a9ec2c",
  "_sd_alg": "sha-256"
}

Note that this contains only one disclosure digest. Only if address is to be disclosed, the respective disclosure needs to be sent containing digests of further disclosures, as shown in the spec.

bc-pi commented 9 months ago

There are undoubtedly at least a few different viable approaches and inevitably each will have its own set of advantages and disadvantages. There will always be some tradeoffs and there are bound to be varying preferences amongst different people. The general approach in the current WG draft has been stable for a while now though. And is being used. So significant changes at this stage would need really compelling reasons and even working group consensus.

bifurcation commented 8 months ago

I agree that pointer-based approaches are unnecessary complexity.

But the fact that the same object can have two different syntaxes is a pain to implement. It would be nice if a disclosure could always be a [salt, key, value] array, and just have the key be equal to JSON null for array disclosures.

Basically, the current structure requires loose parsing with post-hoc validation, whereas with a uniform structure you could be more declarative. For example, in Rust with serde_json, you could just define a disclosure to be struct Disclosure(String, Option<String>, Value), so that the parser would do the work of validation.

bc-pi commented 7 months ago

You'll know from context whether its an array element or object at the point of parsing the JSON array in the disclosure so you can certainly be more strict or declarative.

oauth-wg / oauth-selective-disclosure-jwt

Proposing alternate form to unify disclosure arrays #357