openid / OpenID4VP

53 stars 18 forks source link

Explore alternatives to a bespoke selection syntax #286

Open decentralgabe opened 1 day ago

decentralgabe commented 1 day ago
          JsonPath is more complicated, requires complex string parsing, requires the implementation of many features that are not necessary to select a claim, and is a query language, not a pointer language (JsonPointer would be appropriate to use). It is also somewhat inviting to severe security problems (the 'expression' syntax). That's why we didn't use it. We already received early feedback from implementers that they appreciate us not using JsonPath.

_Originally posted by @danielfett in https://github.com/openid/OpenID4VP/pull/266#discussion_r1810044310_

https://datatracker.ietf.org/doc/html/rfc6901

paulbastian commented 1 day ago

Copying my arguments from the thread:

There has been extensive discussion on this. One of the main motivations for this was to get rid of JSON Path. We investigated JSON Pointer and identified things that could not be covered. Are you aware of other options? I think the proposed format is the simplest possible mechanism and was appreciated by early developer's feedback

decentralgabe commented 1 day ago

I have not been privy to what could not be covered, are you able to provide more detail?

What's written may be simple, but I have two concerns:

  1. It is a spec (selection syntax) within a spec (query language) within a spec (oid4vp) and should probably be broken out
  2. existing tools do exist (I am hearing they may have cases that won't work) - have we attempted to profile existing solutions? or contribute to them directly with our changes?
decentralgabe commented 1 day ago

OK after more thinking here is what I think needs to be addressed (at least) before I'm more comfortable with the query proposal:

  1. *Behavior for Arrays w/Wildcards (``)**:

    The use of * to select all elements in an array introduces ambiguity, especially when nested or applied across multiple levels. JSONPath’s wildcard selector addresses this with segment separation (..[*]), distinguishing recursive from flat selection (Section 2.3.2, RFC 9535).

    To mitigate, we can clarify how deeply nested array elements are addressed with * in mixed object-array structures to avoid ambiguity.

  2. Error Handling Guidance:

    JSON Pointer emphasizes the need for applications to define error handling when paths reference non-existent or invalid values (Section 7, RFC 6901). The current language lacks explicit guidance for handling invalid paths or out-of-bounds indices.

    We can address this by specifying error-handling behavior—should evaluation terminate on an invalid path, or return an empty result?

  3. Filtering Support:

    JSONPath supports conditional filters and recursive descent (e.g., $..[?(@.price < 10)]), which are useful for querying complex credential structures (Section 2.3.5, RFC 9535). This feature appears to be under discussion: https://github.com/openid/OpenID4VP/issues/288.

  4. *Query Injection via Wildcards (``)**:

    The proposed syntax allows wildcards (*) to retrieve all elements in an array, for example:

    ["degrees", "*", "type"]

    While intentional, this behavior could lead to over-broad queries, such as:

    ["degrees", "*", "secret"]

    This could expose sensitive fields if they exist within nested objects. JSONPath faced similar challenges with dynamic query building (Section 4.2, RFC 9535), as flexible selection mechanisms complicate access control.

    To mitigate:

    1. Implement query validation to restrict unauthorized combinations of wildcards and sensitive fields.
    2. Apply post-query filtering to make sure the data returned is consistent with the user's access policy.

    Listing restricted fields may help, or implementer guidance could suffice.

  5. Path Normalization:

    All query elements should be treated as complete values without parsing or processing inside quotes. For example:

    ["credentialSubject", "degree", "~metadata"]

    and

    ["credentialSubject", "degree", "field/major"]

    Both "~metadata" and "field/major" should be treated as valid keys without special interpretation. While this is likely implied, making it explicit would enhance clarity.

danielfett commented 20 hours ago

OK after more thinking here is what I think needs to be addressed (at least) before I'm more comfortable with the query proposal:

  1. *Behavior for Arrays w/Wildcards (`)**: The use ofto select all elements in an array introduces ambiguity, especially when nested or applied across multiple levels. JSONPath’s wildcard selector addresses this with segment separation (..[]), distinguishing recursive from flat selection ([Section 2.3.2, RFC 9535](https://www.rfc-editor.org/rfc/rfc9535.html#name-wildcard-selector)). To mitigate, we can clarify how deeply nested array elements are addressed with*` in mixed object-array structures to avoid ambiguity.

There is a clearly defined processing rule for the wildcard selector (now a null value to align with SD-JWT VC and avoid collisions with keys named *). That rule intends to define unambiguously how that selector is to be treated. I believe it is sufficient, but if it is not, please propose a PR.

  1. Error Handling Guidance: JSON Pointer emphasizes the need for applications to define error handling when paths reference non-existent or invalid values (Section 7, RFC 6901). The current language lacks explicit guidance for handling invalid paths or out-of-bounds indices. We can address this by specifying error-handling behavior—should evaluation terminate on an invalid path, or return an empty result?

This should be covered by the processing rules. We might want to add a rule saying "If the component is anything else, abort"; is there any other case missing?

  1. Filtering Support: JSONPath supports conditional filters and recursive descent (e.g., $..[?(@.price < 10)]), which are useful for querying complex credential structures (Section 2.3.5, RFC 9535). This feature appears to be under discussion: Filter on data in DCQL #288.

This is a different kind of filter. I think it is a feature, not a bug, that this kind of filtering is missing, as it may introduce an information leak side-channel, disclosing information without the user's consent. Additionally, it is complex to parse and can lead to insecure implementations.

The value filtering we want to introduce does none of this, and it would allow filtering on the claim value only. It would not be embedded in the path expression.

  1. *Query Injection via Wildcards (`)**: The proposed syntax allows wildcards (*`) to retrieve all elements in an array, for example:

       ["degrees", "*", "type"]

    While intentional, this behavior could lead to over-broad queries, such as:

       ["degrees", "*", "secret"]

That is a broad query, but how would that be over-broad? User consent would also still be required before disclosing anything.

   This could expose sensitive fields if they exist within nested objects. JSONPath faced similar challenges with dynamic query building ([Section 4.2, RFC 9535](https://www.rfc-editor.org/rfc/rfc9535.html#name-attack-vectors-on-how-jsonp)), as flexible selection mechanisms complicate access control.

Is that the right link? A very important design feature of this language is that we don't have to escape stuff (because we don't parse strings), so injection attacks in the sense described in Section 4.2 of RFC9535 are not applicable. Obviously, implementers using untrusted values to build the query don't know what the query looks like, but that is a triviality.

To mitigate:

  1. Implement query validation to restrict unauthorized combinations of wildcards and sensitive fields.
  2. Apply post-query filtering to make sure the data returned is consistent with the user's access policy.

Can you give an example of an unauthorized combination that would lead to insecure or unexpected behavior (beyond asking the user for more data than expected)?

Listing restricted fields may help, or implementer guidance could suffice.

What are restricted fields?

5. **Path Normalization**:
   All query elements should be treated as complete values without parsing or processing inside quotes. For example:
   ```json
   ["credentialSubject", "degree", "~metadata"]
   ```        

   and
   ```json
   ["credentialSubject", "degree", "field/major"]
   ```

   Both `"~metadata"` and `"field/major"` should be treated as valid keys without special interpretation. While this is likely implied, making it explicit would enhance clarity.

This is not only implied, it is clearly defined. Again, if not, please propose improved language.

decentralgabe commented 15 hours ago

Thanks for the thorough reply.

This should be covered by the processing rules. We might want to add a rule saying "If the component is anything else, abort"; is there any other case missing?

Yes that addition seems like the right step.

Can you give an example of an unauthorized combination that would lead to insecure or unexpected behavior (beyond asking the user for more data than expected)?

I am thinking of cases of how this is rendered in a user's wallet. They're asked fields from object X - "Field A" and click OK, "Field B" and click "OK" then the whole object X and it's confusing, they could accidentally over-disclose.

Similarly they might want a concept like "never disclose these fields even if they're cleverly requested."

It is a bit abstract, I am trying to be mindful of creating confusing cases for users responding to requests where they might over-disclose their data.

I agree with your other points.