Support for SD in arrays

danielfett commented 1 year ago

We currently do not support selective disclosure for elements in an array (we do support it for keys within objects within arrays, but not for arbitrary array elements).

Example:

The key foo in {"foo": [23, "bar", false]} can be SD, but we can't apply SD to the array elements, i.e., we cannot disclose 23 without disclosing "bar".

Proposal:

For object keys, we use the new key _sd to collect the disclosure hashes. For arrays, we would need to mark the array as selectively disclosable, e.g., by using a special first element in the array like this:

{"foo": ["_sd", "7pHe1uQ5uSClgAxXdG0E6dKnBgXcxEO1zvoQO9E5Lr4", "9-VdSnvRTZNDo-4Bxcp3X-V9VtLOCRUkR6oLWZQl81I", "nTzPZ3Q68z1Ko_9ao9LK0mSYXY5gY6UG6KEkQ_BdqU0"]}

The three hashes correspond to the three elements in the array. The verifier can resolve the disclosed elements into their original values and ignore/replace the other values. Order is preserved. This works for arbitrary element types.

This would not allow to mix SD and non-SD elements in an array, but it seems unlikely that that would be a problem. WDYT @bc-pi @Sakurann?

bc-pi commented 1 year ago

IMHO the need to allow for selective disclosure of individual elements in an array doesn't warrant the additional complexity. Which, by itself isn't particularly complicated. But it is different than the general model we've got now that covers a name and value (of any type) in the same way at any level. The model is consistent and conceptually simple. I might go so far as to say it's elegant. Is there a real concrete requirement or use case for this that is likely to be commonly occurring? And couldn't be achieved modeling the data a little differently, when/if necessary.

danielfett commented 1 year ago

There's a concrete use case in the eKYC syntax where the evidence element may contain multiple descriptions of documents etc., not all of which are necessary to disclose for all use cases. With the current syntax, either all documents need to be revealed or at least the structure of undisclosed documents is visible to a verifier. This is not ideal.

@tlodderstedt also had another use case where, for example, multiple types of holder binding are encoded into the SD-JWT and only one needs to be disclosed for using the credential (see https://github.com/oauth-wg/oauth-selective-disclosure-jwt/pull/193/files#diff-573098781f9a66e1e4eb42edf9799c0ea4dc69fed1db6b72805aec27563eafe7).

bc-pi commented 1 year ago

I must admit that I don't find those cases terribly compelling. But point taken nonetheless.

@tlodderstedt also had another use case where, for example, multiple types of holder binding are encoded into the SD-JWT and only one needs to be disclosed for using the credential (see https://github.com/oauth-wg/oauth-selective-disclosure-jwt/pull/193/files#diff-573098781f9a66e1e4eb42edf9799c0ea4dc69fed1db6b72805aec27563eafe7).

Is that holder binding syntax actually defined (or aspiring to be) somewhere? I couldn't find it looking through the VC WG publications and google/bing search for ClaimsBindinding2022 gives zero results.

bc-pi commented 1 year ago

The content of disclosures (https://datatracker.ietf.org/doc/html/draft-ietf-oauth-selective-disclosure-jwt-02#name-creating-disclosures ) would need to conditionally omit the name when for a value in an array.

We might also want to think about a prefix on the value vs a special first element in the array - e.g, {"foo": ["_sd:7pHe1uQ5uSClgAxXdG0E6dKnBgXcxEO1zvoQO9E5Lr4", "_sd:9-VdSnvRTZNDo-4Bxcp3X-V9VtLOCRUkR6oLWZQl81I", "_sd:nTzPZ3Q68z1Ko_9ao9LK0mSYXY5gY6UG6KEkQ_BdqU0"]}. I dunno, it's a little bikesheddy but would allow mixing of SD and non-SD elements in an array.

With that said, I still really like the consistency and simplicity of the current model and don't love the idea of expanding it.

danielfett commented 1 year ago

The prefix solution has the advantage that the number of elements in the array is unchanged. For the other questions, it would be great if @tlodderstedt could chime in :-)

tlodderstedt commented 1 year ago

The most obvious example is the EBSI schema for diplomas.

https://ec.europa.eu/digital-building-blocks/code/projects/EBSI/repos/json-schema/browse/schemas/ebsi-muti-uni-pilot/verifiable-diploma/2022-11/examples/Ildiko%20Mazar_Graduate%20University%20Study%20of%20Civil%20Engineering_shortened.json

When I tried to transform this into a sd-jwt, I learned it uses arrays on various levels for the different objects (e.g. achievements). I would assume it is desirable to be able to selectively disclose those.

I have also worked on an example of an eID (PID in eIDAS terms) VC that has different kins of holder binding. The syntax is inspired by a proposal Oliver Terbu and Paul Bastian presented at IIW.

{
    "iss": "https://pid_issuer.memberstate.eu",
    "iat": 1541493724,
    "type": "PersonIdentificationData",
            "holder":
            {
                "binding":
                    [
                        {
                            "type": "CryptographicBinding2022",
                            "did": "did:example:1386147674571545"
                        },
                                                {
                            "type": "BiometricBinding2022",
                            "template": "..."
                        },
                    ],
            },
    "credentialSubject":
      {
        "given_name": "Erika",
        "family_name": "Mustermann",
        "nationalities": ["DE"],
        "birth_family_name": "Schmidt",
        "birthdate": "1973-01-01",
        "place_of_birth": "Regensburg",
        "address":
          {
            "postal_code": "12345",
            "locality": "Irgendwo",
            "street_address": "Sonnenstrasse 23",
            "country_code": "DE"
          },
        "is_over_18": true,
        "is_over_21": true,
        "is_over_65": false
      }
  }

Both entries define a certain binding (one to a cryptographically bound identifier, the other one has a biometric template). Only one of them is relevant for a certain verifier (online - crypto, offline/supervised - biometry). So I would like to disclose them individually.

danielfett commented 1 year ago

Another option would be to replace the array elements by something like this: {"_sd": "9-VdSnvRTZNDo-4Bxcp3X-V9VtLOCRUkR6oLWZQl81I"} (note that the value here is not an array). Also just bikeshedding here.

Sakurann commented 1 year ago

this claimset

"claims": {
      "given_name": "Max",
      "family_name": "Müller",
      "nationalities": [
        "DE",
        "JPN"
      ]

would be

"claims": {
        "_sd": [
          "1qb26tNg6OZuZyVDYwK4--mQxXbZqwcQbhUxGHrXeLM",
          "AHX0EgNpd_wak07lK8HX2izDNntsUZojuzyEWd2GJdk",
          "FwzTz0THaEOzexgEzLRXu-zsTPND7by3aBF57AwKCZI",
        ],
        "nationalities" : [
          "_sd", 
          "7pHe1uQ5uSClgAxXdG0E6dKnBgXcxEO1zvoQO9E5Lr4", 
          "9-VdSnvRTZNDo-4Bxcp3X-V9VtLOCRUkR6oLWZQl81I"
        ]

or

"claims": {
        "_sd": [
          "1qb26tNg6OZuZyVDYwK4--mQxXbZqwcQbhUxGHrXeLM",
          "AHX0EgNpd_wak07lK8HX2izDNntsUZojuzyEWd2GJdk",
          "FwzTz0THaEOzexgEzLRXu-zsTPND7by3aBF57AwKCZI",
        ],
        "nationalities" : [
           "_sd:7pHe1uQ5uSClgAxXdG0E6dKnBgXcxEO1zvoQO9E5Lr4", 
           "_sd:9-VdSnvRTZNDo-4Bxcp3X-V9VtLOCRUkR6oLWZQl81I"
         ]

TakahikoKawasaki commented 1 year ago

As I'm a newbie in SD-JWT discussion, I don't know whether this issue has already reached an agreed solution or not. But if not yet, another approach that may be worth considering is to include an array index in the claim name when a disclosure is prepared. For example, ...

Plain Payload

{
  "array": [ "value0", "value1" ]
}

Disclosure for the first element

[ "{salt}", "array[0]", "value0" ]

Payload with _sd

{
  "_sd": [ "{digest of the Disclosure}" ],
  "array": [ null, "value1" ]
}

jogu commented 1 year ago

The nationalities example is kind of interesting - if I've understood the proposals, all of them require that disclosing one of your nationalities requires disclosing that you have more than one nationality. i.e. if I'm French & Iranian, I think there's no way to selectivity disclose "I'm a French national" without disclosing "I also have another nationality" because [even if the nationality claim was not previously disclosed] disclosing one element will disclose the number of elements in the array?

TakahikoKawasaki commented 1 year ago

@jogu I suppose that decoy digests can be used for the concern.

danielfett commented 1 year ago

Taka's idea would nicely work around two problems that my proposals above have:

There would be two very different types of Disclosures.
The number of elements would always be disclosed.

I would like to entertain a slightly modified version of Taka's proposal:

To avoid string manipulation, escaping problems, and a temptation to use jsonpath or similar, the disclosures would not encode the element index in the key name, but separately - like this:

[ "{salt}", ["array", 2], "value0" ]

Admittedly, this would introduce polymorphism, but it would not be much worse than with my approach above.

I would not allow non-sd plaintext values in the array in the SD-JWT, but for simplicity constrain arrays to "always disclosed" or "always sd". The array in the SD-JWT would just be omitted:

{
  "_sd": [ "{digest of the Disclosure}" ]
}

TakahikoKawasaki commented 1 year ago

Encoding an array name and an index into an array like ["array", 2] seems a good idea.

bc-pi commented 1 year ago

Daniel's slightly modified version of Taka's proposal seems like the way to go.

danielfett commented 1 year ago

... and I'm just working on implementing this in the SD-JWT reference implementation.

danielfett commented 1 year ago

While implementing this I noticed a problem with the recently chosen approach: This approach relies on the fact that an array must be nested within an object. This means that

the approach does not work for top-level arrays (without defining another exception, like a "null" array name),
the approach does not work for arrays nested in arrays, and
the implementation for creating the SD-JWT becomes more complex: So far, a simple recursive function call was sufficient (each function call operating on a single key/value in an object at a time, creating the disclosure and returning the hash to add to the _sd element). Now if there's an array, we need to go back one level higher to find the _sd element and add more hashes there. The behavior is undefined for arrays-in-arrays.

@TakahikoKawasaki Did you encounter any of these problems? What are your thoughts?

I'm thinking about going back to the prefix variant, which suddenly seems much more attractive :-)

TakahikoKawasaki commented 1 year ago

@danielfett Good points. I didn't imagine the cases you mentioned. 😅

Top-level Array

If you mean the data like below,

[ "apple", "banana", "cherry" ]

the following JSON for disclosure may work by using null as the array name

[ "<salt>", [ null, 0 ], "apple" ]

and creating an outer JSON object like below.

{
  "_sd": [ "<digest0>" ]
}

However, this approach would make it difficult to judge, if not possible, whether the outer JSON object has existed from the beginning or has been added just for the top-level anonymous array.

Therefore, my gut feeling tells that "_sd" for array elements should exist in the array whichever approach (an additional "_sd" element or _sd: prefix) is used.

Nested Array

This needs special considerations. At least, it seems difficult to create disclosures and "_sd" arrays for the following data with the ["salt",["array-name",index],value] approach...

[ [ "apple", "banana", "cherry" ] ]

A conclusion is that discussions need to continue. 😅

bc-pi commented 1 year ago

the approach does not work for top-level arrays (without defining another exception, like a "null" array name), the approach does not work for arrays nested in arrays

Offhand, I feel like those could be considered acceptable limitations and that allowing for SD in top-level arrays or inside arrays nested in arrays isn't necessary.

danielfett commented 1 year ago

Thanks @TakahikoKawasaki! I think I agree to your conclusions.

@bc-pi I'd like to add that I'm actually less concerned about the restrictions themselves but feel that they (and the need to explain them) are strong indicators of a less-than-ideal approach.

bc-pi commented 1 year ago

indicators of a less-than-ideal approach.

That's a fair/good point :) All the approaches thus far have had (in my mind anyway) some indicators of being less-than-ideal though. I'm not sure there's an obvious "best" one, so I'm not necessarily advocating for one over the other. I'm just trying to "contribute" to the discussions.

danielfett commented 1 year ago

I implemented the prefix solution now, but I introduced a mechanism to avoid one concern that I had. The basic idea is to use arrays like this:

  "nationalities": [
    "_sd:Q7R_-cBP9LWCq9At1XWNRZyLTFHOr0S9fLcXQjyBgH4",
    "_sd:o9qCZPD-_n0pa9nH_sBxtVKXuDyx1ALQjzYPrOJ3p4s",
    "DE"
  ],

(Here, the third element is non-SD.)

The main concern that I had with this solution is that there may be conflicts. If somebody has data where _sd can appear as part of the data (for example, user-supplied data), the processing will not be correct. Therefore I propose to use _sd: as the default prefix, but to allow definition of a different prefix by adding the top-level key _sd_arr_pfx. For example, the prefix can consist of a nonce or another string that is guaranteed to not appear in real data items. This approach is similar to boundaries in multipart MIME. (There is no need to use _sd_arr_pfx if there are no arrays with SD or if the default prefix is used.)

The following is a full SD-JWT payload using the new array feature and defining _sdx: as a prefix:

{
  "_sd": [
    "sGmV2tSLHmJScETevXgTQ-bM7O5ZnQuu-ypqI2vB-JU"
  ],
  "iss": "https://example.com/issuer",
  "iat": 1683000000,
  "exp": 1883000000,
  "sub": "john_doe_42",
  "nationalities": [
    "_sdx:Q7R_-cBP9LWCq9At1XWNRZyLTFHOr0S9fLcXQjyBgH4",
    "_sdx:o9qCZPD-_n0pa9nH_sBxtVKXuDyx1ALQjzYPrOJ3p4s",
    "DE"
  ],
  "is_over": {
    "_sd": [
      "NKZs2QqvniVtS3k-YXxMag_PiyUQizlgdsXgfIEWZcs",
      "hLJKFgko4IvkO_R8lbX3xNRcaEo0t0awFMnrO0dXdvg",
      "hN7ybNRpz_UIZAH4rPTNl_c07JyUQtzHlAwuyVrsQgs"
    ]
  },
  "addresses": [
    {
      "street": "123 Main St",
      "city": "Anytown",
      "state": "NY",
      "zip": "12345",
      "type": "main_address"
    },
    "_sdx:TDx4IHvi4gxmIGbEKZa4AM6PYRIHtP5VxjraME72Nh8"
  ],
  "null_values": [
    null,
    "_sdx:YImgtY5gfEpLKDA8PQ93hkUCeAL0lz-UKsnK1IGJHFo",
    "_sdx:duZZCCrTo-ROiWT8uEpPkgu_XnpsIWDtXhOqSOJ1EEo",
    null
  ],
  "data_types": [
    "_sdx:llHOoLZriL1NCNBOB3lLj5cFhZS2I4UvXeWIofOSyzU",
    "_sdx:5h5X-YP38eRr7yS1sydUGJzbTXQiYoZy2CELGTpy63w",
    "_sdx:ZcGhdemizhDvOKiM3huX69MO5MJ3k_6N4TDADYi1KI8",
    "_sdx:MtEhyiQLsysJR9x6XgGPo2AS_audhRVXEZ3GsNVM30o",
    "_sdx:auIGvdZGiGSzFmMXxM2ErtbN-5h-y0BIeFsl_aDEN48",
    "_sdx:HTw5A7z-pzJ8RI37pC9Z2-1IyM-ZjVYG-iUkhpV4Ahw",
    "_sdx:0MZET02ximXt6FxwBPOsEBUxuo_OBlNxQsfmRiatBeY"
  ],
  "nested_array": [
    [
      "_sdx:xlEWQP4kR9414Kyp5YqMNOBFMlcTa4zqR8ueeSVUCbs",
      "_sdx:hcy7DI4AiQaPPTw40V6NYGzllLikLeyifv43a7SaX6w"
    ],
    [
      "_sdx:czFUZij2d-W1nAOU68i6khwwphueeOyTwSJDCmh7gIk",
      "_sdx:nLEUS7NSV2EUExiiET2itDjWi6dzV7re0Btcf0omUso"
    ]
  ],
  "array_with_recursive_sd": [
    "boring",
    "_sdx:uaf1fEDM93C4zWX_PZlGCgcfkEgvMpEJUCCFozWCccY",
    [
      "_sdx:DWRkboVZ-cTya_WGt0-vaaQVmETozAtFip68mxU1Z0I",
      "_sdx:RfQ40cSzxPe494mtOEbWjgA-ymegpiBiPyGTz2dyS4M"
    ]
  ],
  "_sd_alg": "sha-256",
  "cnf": {
    "jwk": {
      "kty": "EC",
      "crv": "P-256",
      "x": "TCAER19Zvu3OHF4j4W4vfSVoHIP1ILilDls7vCeGemc",
      "y": "ZxjiWWbZMQGHVWKVQ4hbSIirsVfuecCE6t4jT9F2HZQ"
    }
  },
  "_sd_arr_pfx": "_sdx:"
}

The code is in the sd-jwt repo, this is not finished yet but you should be able to play with the examples: https://github.com/danielfett/sd-jwt/pull/4

Let me know what you think!

TakahikoKawasaki commented 1 year ago

Where to embed "_sd_arr_pfx" (and "_sd_alg") in the case of a top-level anonymous array?

danielfett commented 1 year ago

That case is so far not covered by the spec. I think that might be acceptable.

bc-pi commented 1 year ago

Regular old JWT itself requires the JWS/JWE payload be a JSON object so that probably is acceptable (even with trying to be more accommodating to arbitrary JSON payloads).

TakahikoKawasaki commented 1 year ago

Another idea came up (brainstorming). Converting any element to a map which contains only "_sd" as a key. This approach conflicts with the current "_sd" array approach. I've not examined yet whether this approach would be able to work for all cases, though. Just showing an idea.

Example 1

["A", "B"]

⬇️

[
  {"_sd":"digest of [salt, 0, A]"},
  {"_sd":"digest of [salt, 1, B]"}
]

Example 2

["A", [ "B", "C" ]]

⬇️

[
  {"_sd":"digest of [salt, 0, A]"},
  [
    {"_sd":"digest of [salt, 0, B]"},
    {"_sd":"digest of [salt, 1, C]"}
  ]
]

Example 3

{
  "a": "A",
  "b": [
    "B",
    {"c": "C"},
    {"d": "D"},
    ["E"],
    ["F"]
  ]
}

⬇️

{
  {"_sd":"digest of [salt, a, A]"},
  "b": [
    {"_sd":"digest of [salt, 0, B]"},
    {
      {"_sd":"digest of [salt, c, C]"}
    },
    {"_sd":"digest of [salt, 2, {d:D}]"},
    [
      {"_sd":"digest of [salt, 0, E]"}
    ],
    {"_sd":"digest of [salt, 4, [F]]"}
  ]
}

TakahikoKawasaki commented 1 year ago

Sorry, the above examples are malformed as JSON, probably.

TakahikoKawasaki commented 1 year ago

I meant the following is wrong.

{
  {"_sd": "digest"}
}

In the case of a JSON object, an "_sd" array would work. In the case of array elements, a JSON object containing an "_sd" key would work.

{
  "a": "A",
  "b": ["B"]
}

{
  "_sd": [
    "digest of [salt, a, A]"
  ],
  "b": [
    {"_sd":"digest of [salt, 0, B]"}
  ]
}

TakahikoKawasaki commented 1 year ago

Or, it may be possible to make "_sd" be always an array.

{
  "_sd": [
    "digest of [salt, a, A]"
  ],
  "b": [
    {
      "_sd": [
        "digest of [salt, 0, B]"
      ]
    }
  ]
}

danielfett commented 1 year ago

I implemented the solution using objects {"_sd": "digest"} instead of strings "_sd:digest" as well now and I must say that both are very similar. When iterating through an array, it is slightly easier to check for a string prefix ("_sd:...") instead of checking that an entry is

an object
with exactly one element
which is not an array, but a string

in order to avoid confusion with an object containing SD'd keys. I like about this solution that it doesn't involve string handling and it most likely doesn't need the _sd_arr_pfx construction to avoid conflicts. First of all, we're using the already defined "_sd", and second, we're only using it as a key value (which is generally less likely to contain very variable or user-supplied data).

For the disclosures, I think it makes sense to stick to a two-element format (["eluV5Og3gSNII8EYnsxA_A", "CA"]). As Brian pointed out yesterday, we should keep the Disclosures short. And we don't need the position in the array if the position is already encoded in the SD-JWT itself.

Here is a full example:

{
  "_sd": [
    "sGmV2tSLHmJScETevXgTQ-bM7O5ZnQuu-ypqI2vB-JU"
  ],
  "iss": "https://example.com/issuer",
  "iat": 1683000000,
  "exp": 1883000000,
  "sub": "john_doe_42",
  "nationalities": [
    {
      "_sd": "i7eKdHc_ZMOnhiyu3TJj5GVDQ7ZwJOMXFD3XgUbo8GQ"
    },
    {
      "_sd": "usWXFPKaqKMreTrj72QD24wB8xc7lQ4zCnrnn8ZRVeo"
    },
    "DE"
  ],
  "is_over": {
    "_sd": [
      "2ovMJR_ZNMB6ngFK3SUQnRIgyM548DzR7tJFTO-ZzBM",
      "CeVqxVUVHpva5Xp0X-NeUvhixjDYp7PTZ4BaFWGXUek",
      "dg1pBJV-dABilqD2RYiG8z4gRtuDFdRBdlwHgdLFEx8"
    ]
  },
  "addresses": [
    {
      "street": "123 Main St",
      "city": "Anytown",
      "state": "NY",
      "zip": "12345",
      "type": "main_address"
    },
    {
      "_sd": "RNWcxPD8A1ZhAm6_wAiJSoSzIRb_w1QUaKGvS240K-Y"
    }
  ],
  "null_values": [
    null,
    {
      "_sd": "hhB5pziS4s0dSx0kql31vDtuo3JVDfB4VZ-YHcj2A9M"
    },
    {
      "_sd": "o_VFRluA190wrH5E1yr2r39UyTnx3-m3qPREikSr6Qo"
    },
    null
  ],
  "data_types": [
    {
      "_sd": "nY72P6V5uHQe-BYkwYj-paG2y3fmj614FKQQhhk6T1E"
    },
    {
      "_sd": "zt7kWPtZTpMYKPoaQd-L71L-aKYMYYNLOFOf-yH3uLY"
    },
    {
      "_sd": "K1yxHJ4z10JKd2jRmQuziCym3D1oXB0NaFVLHEOv8XM"
    },
    {
      "_sd": "yr-1NDhAaFYPvLrAzvdFfBwRJS_wn199JX0adDYa6Ak"
    },
    {
      "_sd": "NoTOTjWq1_cYu3kfQKh3jWrx9OLSIIdhYX0_92-RD-Y"
    },
    {
      "_sd": "biBLCP424CoDYTpBmden-zGmYOdE0GSHlerSaoYeQZ0"
    },
    {
      "_sd": "_z-We_gbvKo84jpuhBQS9v9yVhDo2--FCDNWMHMezUQ"
    }
  ],
  "nested_array": [
    [
      {
        "_sd": "FbJ_W_M-Gl9rMUR8fcsMFdiHV-qEabiT-u9eHvNKSAA"
      },
      {
        "_sd": "zo6muzFQJ9UCeFuy3Dq_YInQzLGimJVIztHGntWVxw0"
      }
    ],
    [
      {
        "_sd": "n-TOQDur9EA2k9G_VVqlvkOYCzIFb28LKA99IaQfFt8"
      },
      {
        "_sd": "xvJ7NwhRY93UqhcqVKF-Ap7HwZpKe1raEWZg_WozBBs"
      }
    ]
  ],
  "array_with_recursive_sd": [
    "boring",
    {
      "_sd": "KIK9FOQ3C-jLxGW9oRYTL-AETF3eGolP8lyVRVFOqX8"
    },
    [
      {
        "_sd": "uN8DYtT68Do3MAO9deTagWZx-akgd6DmzI4x9xFN7bs"
      },
      {
        "_sd": "F5STX6452Aw9VQyFh5vclX-SlUAuu_r_ax-ow35e4Jw"
      }
    ]
  ],
  "_sd_alg": "sha-256",
  "cnf": {
    "jwk": {
      "kty": "EC",
      "crv": "P-256",
      "x": "TCAER19Zvu3OHF4j4W4vfSVoHIP1ILilDls7vCeGemc",
      "y": "ZxjiWWbZMQGHVWKVQ4hbSIirsVfuecCE6t4jT9F2HZQ"
    }
  }
}

Array Entry:

SHA-256 Hash: i7eKdHc_ZMOnhiyu3TJj5GVDQ7ZwJOMXFD3XgUbo8GQ
Disclosure:\ WyIyR0xDNDJzS1F2ZUNmR2ZyeU5STjl3IiwgIlVTIl0
Contents: ["2GLC42sKQveCfGfryNRN9w", "US"]

Array Entry:

SHA-256 Hash: usWXFPKaqKMreTrj72QD24wB8xc7lQ4zCnrnn8ZRVeo
Disclosure:\ WyJlbHVWNU9nM2dTTklJOEVZbnN4QV9BIiwgIkNBIl0
Contents: ["eluV5Og3gSNII8EYnsxA_A", "CA"]

Claim 13:

SHA-256 Hash: dg1pBJV-dABilqD2RYiG8z4gRtuDFdRBdlwHgdLFEx8
Disclosure:\ WyI2SWo3dE0tYTVpVlBHYm9TNXRtdlZBIiwgIjEzIiwgdHJ1ZV0
Contents: ["6Ij7tM-a5iVPGboS5tmvVA", "13", true]

Claim 18:

SHA-256 Hash: CeVqxVUVHpva5Xp0X-NeUvhixjDYp7PTZ4BaFWGXUek
Disclosure:\ WyJlSThaV205UW5LUHBOUGVOZW5IZGhRIiwgIjE4IiwgZmFsc2Vd
Contents: ["eI8ZWm9QnKPpNPeNenHdhQ", "18", false]

Claim 21:

SHA-256 Hash: 2ovMJR_ZNMB6ngFK3SUQnRIgyM548DzR7tJFTO-ZzBM
Disclosure:\ WyJRZ19PNjR6cUF4ZTQxMmExMDhpcm9BIiwgIjIxIiwgZmFsc2Vd
Contents: ["Qg_O64zqAxe412a108iroA", "21", false]

Array Entry:

SHA-256 Hash: RNWcxPD8A1ZhAm6_wAiJSoSzIRb_w1QUaKGvS240K-Y
Disclosure:\ WyJBSngtMDk1VlBycFR0TjRRTU9xUk9BIiwgeyJzdHJlZXQiOiAiNDU2IE1h\ aW4gU3QiLCAiY2l0eSI6ICJBbnl0b3duIiwgInN0YXRlIjogIk5ZIiwgInpp\ cCI6ICIxMjM0NSIsICJ0eXBlIjogInNlY29uZGFyeV9hZGRyZXNzIn1d
Contents: ["AJx-095VPrpTtN4QMOqROA", {"street": "456 Main St", "city":\ "Anytown", "state": "NY", "zip": "12345", "type":\ "secondary_address"}]

Array Entry:

SHA-256 Hash: hhB5pziS4s0dSx0kql31vDtuo3JVDfB4VZ-YHcj2A9M
Disclosure:\ WyJQYzMzSk0yTGNoY1VfbEhnZ3ZfdWZRIiwgbnVsbF0
Contents: ["Pc33JM2LchcU_lHggv_ufQ", null]

Array Entry:

SHA-256 Hash: o_VFRluA190wrH5E1yr2r39UyTnx3-m3qPREikSr6Qo
Disclosure:\ WyJHMDJOU3JRZmpGWFE3SW8wOXN5YWpBIiwgbnVsbF0
Contents: ["G02NSrQfjFXQ7Io09syajA", null]

Array Entry:

SHA-256 Hash: nY72P6V5uHQe-BYkwYj-paG2y3fmj614FKQQhhk6T1E
Disclosure:\ WyJsa2x4RjVqTVlsR1RQVW92TU5JdkNBIiwgbnVsbF0
Contents: ["lklxF5jMYlGTPUovMNIvCA", null]

Array Entry:

SHA-256 Hash: zt7kWPtZTpMYKPoaQd-L71L-aKYMYYNLOFOf-yH3uLY
Disclosure:\ WyJuUHVvUW5rUkZxM0JJZUFtN0FuWEZBIiwgNDJd
Contents: ["nPuoQnkRFq3BIeAm7AnXFA", 42]

Array Entry:

SHA-256 Hash: K1yxHJ4z10JKd2jRmQuziCym3D1oXB0NaFVLHEOv8XM
Disclosure:\ WyI1YlBzMUlxdVpOYTBoa2FGenp6Wk53IiwgMy4xNF0
Contents: ["5bPs1IquZNa0hkaFzzzZNw", 3.14]

Array Entry:

SHA-256 Hash: yr-1NDhAaFYPvLrAzvdFfBwRJS_wn199JX0adDYa6Ak
Disclosure:\ WyI1YTJXMF9OcmxFWnpmcW1rXzdQcS13IiwgImZvbyJd
Contents: ["5a2W0_NrlEZzfqmk_7Pq-w", "foo"]

Array Entry:

SHA-256 Hash: NoTOTjWq1_cYu3kfQKh3jWrx9OLSIIdhYX0_92-RD-Y
Disclosure:\ WyJ5MXNWVTV3ZGZKYWhWZGd3UGdTN1JRIiwgdHJ1ZV0
Contents: ["y1sVU5wdfJahVdgwPgS7RQ", true]

Array Entry:

SHA-256 Hash: biBLCP424CoDYTpBmden-zGmYOdE0GSHlerSaoYeQZ0
Disclosure:\ WyJIYlE0WDhzclZXM1FEeG5JSmRxeU9BIiwgWyJUZXN0Il1d
Contents: ["HbQ4X8srVW3QDxnIJdqyOA", ["Test"]]

Array Entry:

SHA-256 Hash: _z-We_gbvKo84jpuhBQS9v9yVhDo2--FCDNWMHMezUQ
Disclosure:\ WyJDOUdTb3VqdmlKcXVFZ1lmb2pDYjFBIiwgeyJmb28iOiAiYmFyIn1d
Contents: ["C9GSoujviJquEgYfojCb1A", {"foo": "bar"}]

Array Entry:

SHA-256 Hash: FbJ_W_M-Gl9rMUR8fcsMFdiHV-qEabiT-u9eHvNKSAA
Disclosure:\ WyJreDVrRjE3Vi14MEptd1V4OXZndnR3IiwgImZvbyJd
Contents: ["kx5kF17V-x0JmwUx9vgvtw", "foo"]

Array Entry:

SHA-256 Hash: zo6muzFQJ9UCeFuy3Dq_YInQzLGimJVIztHGntWVxw0
Disclosure:\ WyJIM28xdXN3UDc2MEZpMnllR2RWQ0VRIiwgImJhciJd
Contents: ["H3o1uswP760Fi2yeGdVCEQ", "bar"]

Array Entry:

SHA-256 Hash: n-TOQDur9EA2k9G_VVqlvkOYCzIFb28LKA99IaQfFt8
Disclosure:\ WyJPQktsVFZsdkxnLUFkd3FZR2JQOFpBIiwgImJheiJd
Contents: ["OBKlTVlvLg-AdwqYGbP8ZA", "baz"]

Array Entry:

SHA-256 Hash: xvJ7NwhRY93UqhcqVKF-Ap7HwZpKe1raEWZg_WozBBs
Disclosure:\ WyJNMEpiNTd0NDF1YnJrU3V5ckRUM3hBIiwgInF1eCJd
Contents: ["M0Jb57t41ubrkSuyrDT3xA", "qux"]

Claim baz:

SHA-256 Hash: 6ZSZVDX4TeL5yplka7RIt1w_V_BA2ebI041AEod-IAI
Disclosure:\ WyJEc210S05ncFY0ZEFIcGpyY2Fvc0F3IiwgImJheiIsIHsicXV4IjogInF1\ dXgifV0
Contents: ["DsmtKNgpV4dAHpjrcaosAw", "baz", {"qux": "quux"}]

Array Entry:

SHA-256 Hash: KIK9FOQ3C-jLxGW9oRYTL-AETF3eGolP8lyVRVFOqX8
Disclosure:\ WyJlSzVvNXBIZmd1cFBwbHRqMXFoQUp3IiwgeyJfc2QiOiBbIjZaU1pWRFg0\ VGVMNXlwbGthN1JJdDF3X1ZfQkEyZWJJMDQxQUVvZC1JQUkiXSwgImZvbyI6\ ICJiYXIifV0
Contents: ["eK5o5pHfgupPpltj1qhAJw", {"_sd":\ ["6ZSZVDX4TeL5yplka7RIt1w_V_BA2ebI041AEod-IAI"], "foo":\ "bar"}]

Array Entry:

SHA-256 Hash: uN8DYtT68Do3MAO9deTagWZx-akgd6DmzI4x9xFN7bs
Disclosure:\ WyJqN0FEZGIwVVZiMExpMGNpUGNQMGV3IiwgImZvbyJd
Contents: ["j7ADdb0UVb0Li0ciPcP0ew", "foo"]

Array Entry:

SHA-256 Hash: F5STX6452Aw9VQyFh5vclX-SlUAuu_r_ax-ow35e4Jw
Disclosure:\ WyJXcHhKckZ1WDh1U2kycDRodDA5anZ3IiwgImJhciJd
Contents: ["WpxJrFuX8uSi2p4ht09jvw", "bar"]

__Claim sd_array__:

SHA-256 Hash: sGmV2tSLHmJScETevXgTQ-bM7O5ZnQuu-ypqI2vB-JU
Disclosure:\ WyJhdFNtRkFDWU1iSlZLRDA1bzNKZ3RRIiwgInNkX2FycmF5IiwgWzMyLCAy\ M11d
Contents: ["atSmFACYMbJVKD05o3JgtQ", "sd_array", [32, 23]]

danielfett commented 1 year ago

And here is another example showing both an SD'd array element and a single-key object with SD:

{
  "iss": "https://example.com/issuer",
  "iat": 1683000000,
  "exp": 1883000000,
  "addresses": [
    {
      "street": "123 Main St",
      "city": "Anytown",
      "state": "NY",
      "zip": "12345",
      "type": "main_address"
    },
    {
      "_sd": "k63-wMoGu03I9dCyrrNnB0ncOXLZhYaA_Q4lCKFIWcU"
    }
  ],
  "array_with_one_sd_object": [
    {
      "_sd": [
        "1R6ziZ1b4uvXf4-DuKx0JSDRoeVTGrzJldw7Jgqac3Q"
      ]
    }
  ],
  "_sd_alg": "sha-256"
}

bc-pi commented 1 year ago

Would you say you prefer the object {"_sd": "digest"} based approach over the string "_sd:digest" based approach, @danielfett? It does seem "more correct" to me. And not needing the _sd_arr_pfx construction is a plus. Looking at the examples you have, I must admit that I don't love the aesthetics of it. But that shouldn't be part of the criteria for choosing.

using the already defined "_sd"

This is true and I agree that using a special key name is cleaner than a string prefix and much better avoids conflict. However, we'd need to be a little careful with it's treatment in the text. There's a lot of current text that would need to be adjusted to allow for the _sd key to have different syntax and semantics based on where it is in the JSON. This sentence and these steps are just two examples. Which might be tricky and risky. We might want to use a new special key name. Maybe _sda or _sdae or something (roughly for selectively disclosable array element but I'm just throwing out ideas). A different key might also be easier to see the difference when just looking at the JSON.

danielfett commented 1 year ago

The {"_sd": "digest"} feels cleaner, the "_sd:digest" looks a bit better. But I have a slight preference for the first one. What do @TakahikoKawasaki @Sakurann think?

I think that using a different key makes sense. It is easier to implement ("check for _sde" instead of "check for a single-element object, where _sd refers to a string not an array") and we avoid polymorphism (_sd is always an array, _sde is always a string).

Now for the bikeshedding part:

Ideally, we would use two or max three characters after the underscore.
_sdae is quite long
_sda or _sde (selective disclosure element) looks like a linux harddisk path, but would be fine
_sa or _se would short, but what would that stand for? _se is at least alphabetically next after _sd.
_[] or [] would look array-ish, but readers might interpret to much into the meaning of the string ("Can I write _[2]?")
just _ is too short and might cause conflicts
[_] looks like a nice placeholder, but doesn't follow our own convention to start with _
... transports the meaning that something is elided at this position in the array (and this is an operator with a remotely related meaning both in JavaScript and Python)
__ or _* or _: or => are other options, but I don't think I like them

Any other ideas? I think I like ... and _se most.

bc-pi commented 1 year ago

Any other ideas?

just more bikeshedding but perhaps _sdi or _si where the i is loosely for item in the array

Sakurann commented 1 year ago

I vote for {"_sd": "digest"} because it avoids the concern we had earlier on the collision of the _sd: prefix and potentially introducing "_sd_arr_pfx". and it also feels cleaner and is easier to check for _sd from Daniel's experience.

Sakurann commented 1 year ago

I also like ... don't want to think every single time what _sa, _si, _sde etc stands for and other characters proposed feel like might cause conflicts. though _: looks cute..

danielfett commented 1 year ago

Pull request for spec text: #283

Sakurann commented 1 year ago

PR merged

oauth-wg / oauth-selective-disclosure-jwt