Clarify algorithms to accommodate recursive redaction

bifurcation commented 10 months ago

Recursive redaction makes it much more complex to validate and handle SD-JWT objects.

It requires that disclosure be processed in a particular order, and since this order is known only to the redactor, the disclosures in the SD-JWT must be listed in a specific order to communicate the order.

It complicates the matching of disclosure objects to redacted fields, which you need to do in order to validate that the disclosures match the Issuer JWT, or to offer an API of the form "delete the disclosure for /address/street_address". Without recursion, you are matching the list of disclosures against the claims in the Issuer JWT. With recursion, you are matching the list of disclosures against the issuer JWT and all other disclosures.

It complicates selective disclosure. Now, if I opt not to disclose a field, I may have rendered some disclosure further down the line meaningless. So instead of just deleting one disclosure, I also have to search through the disclosed value for further SD hashes, and delete the corresponding disclosures.

In principle, we could add specification text that would address these issues. But it would be a significant amount of text for what seems like a fairly niche feature. Seems better just to delete the feature and keep things simple.

bifurcation commented 10 months ago

This is also not a difficult property to enforce: Simply invalidate any disclosure object that discloses a JSON object containing an SD hash (_sd or ...).

danielfett commented 10 months ago

It requires that disclosure be processed in a particular order, and since this order is known only to the redactor, the disclosures in the SD-JWT must be listed in a specific order to communicate the order.

I don't understand this point. The algorithm defined in the spec right now does not care for the order of disclosures (and it works with recursive disclosures).

It complicates the matching of disclosure objects to redacted fields, which you need to do in order to validate that the disclosures match the Issuer JWT, or to offer an API of the form "delete the disclosure for /address/street_address". Without recursion, you are matching the list of disclosures against the claims in the Issuer JWT. With recursion, you are matching the list of disclosures against the issuer JWT and all other disclosures.

As above - processing is fully specified, and it is not very complicated.

It complicates selective disclosure. Now, if I opt not to disclose a field, I may have rendered some disclosure further down the line meaningless. So instead of just deleting one disclosure, I also have to search through the disclosed value for further SD hashes, and delete the corresponding disclosures.

This is a valid point.

In principle, we could add specification text that would address these issues. But it would be a significant amount of text for what seems like a fairly niche feature. Seems better just to delete the feature and keep things simple.

This is also not a difficult property to enforce: Simply invalidate any disclosure object that discloses a JSON object containing an SD hash (_sd or ...).

We had this in a previous version of the spec, but opted to enable recursive disclosures since it is clear that it enables useful features while requiring only a little extra effort for implementers.

bifurcation commented 10 months ago

It requires that disclosure be processed in a particular order, and since this order is known only to the redactor, the disclosures in the SD-JWT must be listed in a specific order to communicate the order.

I don't understand this point. The algorithm defined in the spec right now does not care for the order of disclosures (and it works with recursive disclosures).

I looked again at the algorithm in the document, and I agree that it works as specified. But now I have a different complaint -- that the algorithm is overly complex because of the need for recursive redaction :) What I had in mind was an even simpler algorithm, where the verifier takes each disclosure, identifies its location in the JSON object, and fills in the corresponding value. No need for recursion or recursive searches for hashes.

In other words, without recursive redaction, you could replace steps 2 and 3 with something like the following:

For each disclosure object in the SD-JWT:
1. Compute the hash of the disclosure object using sd_alg
2. Identify where in the object this hash appears, as an ... entry in an array element or as a value in an _sd hash list.
3. Restore the redacted value from the disclosure object
  - If in an array, replace the { "...": ... } with the value
  - If in an object, add a field with the name and value in the disclosure and remove from _sd

danielfett commented 10 months ago

This is almost the same we have now, just with fewer details. The only difference is that you have to walk through the JSON tree instead of iterate on the digests... actually, you could just use your algorithm and repeat it until no further replacements can be made.

bifurcation commented 10 months ago

Sorry, in case it was unclear, by "for each disclosure object", I meant the ~-separated pieces (in code). You seem to be thinking of iterating through the digests in the JSON object.

This seems like a more natural algorithm to me, since you can only disclose what's in the disclosures. And it seems like you need a simpler API -- you only need to be able to look up where a specific SD hash is, as opposed to enumerating all the hashes in the JSON tree.

bc-pi commented 10 months ago

Honestly, I thought recursive redaction was kinda niche too but a nontrivial number of people have wanted/supported it. And the algorithm defined in the spec right now does accommodate it without a lot of extra complexity. Yes that algorithm works by traversing the JSON of the Issuer-singed JWT and pulling in the content of each referenced disclosure (and that content can also recursively reference other disclosures). That seems like a perfectly natural algorithm to me.

bifurcation commented 10 months ago

I think the thing that worries me most here is the algorithm for the Holder to choose which items to disclose. As with the validation algorithm discussed in #373, redaction makes it much harder to tell which disclosures go with which claims, and which disclosures can be dropped without invalidating the SD-JWT.

Without redaction, each disclosure stands alone, and you can tell what it represents by finding its hash in the Issuer JWT. With redaction, you have to work your way through all of the other disclosures, and make sure you end up back in the Issuer JWT.

You say this is not a lot of extra complexity, but it's not even obvious to me how to write an algorithm for the above whose correctness I would have strong confidence in. Hand it to the median programmer, and you're going to get bugs -- invalid JWTs/presentations or inadvertent disclosures.

bifurcation commented 10 months ago

Basically, the choice here is:

Eliminate recursive redaction, get simple, clear algorithms
Keep recursive redaction, make every algorithm recursive

It's not impossible, I just question whether it's worth the work.

bc-pi commented 10 months ago

The holder selecting which disclosures to include is indeed complicated by the recursive stuff. And I do think/agree that that bit of complication hasn't been considered as much. But I'd argue that's more of an implementation detail of a holder or API exposed to the holder. It could be noted in that Processing by the Holder section but I don't think that's changing the algorithm.

That is to say that I don't believe much is needed in the way of spec updates to accommodate recursive disclosures. But yes it adds some complexity to implementations - especially at the holder around choosing which disclosures to include.

Is it worth it? I honestly don't know. But the very rough consensus thus far has been in favor of allowing recursive disclosures. And I don't think it adds as much complexity as has been suggested here.

danielfett commented 10 months ago

We discussed this on the editor's call. Consensus is to keep the feature, in particular since there are some data structures that require recursive disclosures for SD-JWT to be useful for them.

Open question is whether there is any clarification we need to make in the text.

bifurcation commented 10 months ago

Updated title to reflect decision. I can do a review and propose some clarifications.

danielfett commented 4 months ago

@bifurcation Can you please propose a PR for this?

oauth-wg / oauth-selective-disclosure-jwt

Clarify algorithms to accommodate recursive redaction #372