Opportunities to improve the credential query/request syntax

tplooker commented 4 months ago

Over a period of time through implementation feedback on OpenID4VP, several items around the current credential query syntax (based on P.E v2) have been raised. To the best of my ability I have attempted to summarise some of those that I have heard below

Usage of JSON Path/Pointer

Presentation exchange v2 which is the current target of OpenID4VP makes use of JSON Path, which is a highly expressive DSL for reading JSON documents. The problems identified with it are that because of its broad expressivity and feature set, it creates certain security and use-ability challenges.

JSON pointer is a narrower expression syntax then JSON path which alleviates most of the security challenges and has been considered before as an alternative. But to some it still represents significant use-ability challenges and is still arguably too flexible/complex for most of the applications we have within OpenID4VP credential queries/request syntax where the path expressions we desire to be able to define are usually just simple traversals through nest maps and or arrays.

Furthermore some of the credential formats defined by OpenID4VP are not JSON based instead use technologies like CBOR which gives rise to other possible challenges.

The feature set of P.E is massive

Presentation Exchange itself beyond its usages of JSON Path/Pointer is highly expressive enabling all sorts of incredibly complex queries to be represented. However this complexity burdens both developers who need to write robust implementations of P.E and know how to create valid credential request/queries using P.E. Certainly the feedback overtime that I have heard is many of the features of P.E aren't needed for most usecases or the implementation complexity they create outweighs their value.

P.E protocol agnostic design goal complexity

Presentation exchange from the outset aimed to define a query/request syntax that was protocol agnostic. While this design goal does lead to the potential for re-use of P.E across different protocols. It has created duplication of features that are already inherently present in OpenID4VP (and OAuth2 and OpenID Connect for that matter) such as capability negotiation between the protocol participants. For example OpenID and OAuth2 of which OpenID4VP is built atop has always largely handled capability negotiation through metadata exchanged/registered between the protocol participants rather then doing this negotiation during execution of the protocol itself. Whereas presentation exchange due to its protocol agnostic design goal defines how to negotiate things such as cryptographic suites in a credential request/query.

There are several options we have to solve these challenges and the purpose of this issue is to first better collectively understand these concerns and then discuss what solutions we could apply.

selfissued commented 4 months ago

Notes from conversations from three open space sessions at OSW and IIW titled "What does Presentation Exchange do and what parts of it do we actually need?" can be found at https://self-issued.info/?p=2427 and https://self-issued.info/?p=2395. In all these conversations, many people expressed the viewpoint that Presentation Exchange is pretty complicated.

bc-pi commented 4 months ago

I'll go on record here as being not a fan of PE.

Sakurann commented 4 months ago

I think I understand the points outlined in this issue and agree we should discuss if/how we would address them. For the limitations of using JSON Path with CBOR-based mdoc format, this thread also might be useful: https://github.com/WICG/digital-identities/issues/80#issuecomment-1955444351.

Having said that, I also think it is important to discuss and understand what in PE has worked well. So I'll go on record here that PE does the job:

its flexibility allowed our implementation to meet customers' requirements without the need to go back to the WG and ask to add a new parameter.
as an implementer that need to support multiple credential formats, being able to reuse PE across multiple formats is helpful.
ISO has done multiple test events where multiple vendors were able to interoperate using OID4VP with PE for mdocs.
OID4VP has never mandated implementing the entire feature set of PE and so far we have taken the approach of limiting the feature set of PE i.e. as in HAIP and security considerations section in OID4VP.

Again, not saying PE is perfect, or the current approach is the best, just believe these points should be taken into account in this discussion.

David-Chadwick commented 4 months ago

You will see from comments that I made over a year ago that mandating the use of DIF PE was a bad design choice. Instead we should give flexibility to implementors to use whatever query language their community of users choose. By replacing DIF PE with a type and value extension mechanism, then those that want to use DIF PE can set the type to "DIF PE" and the value to the PE query. Those that want to use SQL, or the W3C presentation request (see https://w3c-ccg.github.io/vp-request-spec/) or any other query language can set the type to their chosen language and the value to the query in their language. The metadata of verifiers and wallets can say which query type(s) they support.

awoie commented 4 months ago

I also believe we need a better way than PE for expressing queries.

I have worked on and with several different credential formats, including fully JSON-based ones like IETF SD-JWT VC that do not have any special namespace considerations and have native support for selective disclosure, W3C VCDM 2.0 with a lot of optionality even in the proof format and the way claims use namespaces via @context/JSON-LD processing, and also ISO 18013-5 mdocs that are CBOR-native using COSE_Sign1 and have their own way to define namespaces. All of them are quite different, and I had troubles in applying PE because of differences in:

typing (JSON/JSON-LD)
structure (where in the credential claims are defined)
encoding (JSON/CBOR),
feature set (selective disclosure, policies, etc.).

I believe having a credential-format agnostic query language like PE is quite challenging for those reasons.

selfissued commented 4 months ago

As cited at https://github.com/openid/OpenID4VCI/issues/266#issuecomment-1946999918, a few of us have been working on a format-specific query language proposal as a strawman to replace PE. This was motivated in part by @tlodderstedt's request during the IIW session to "show me a specific counter-proposal".

See https://docs.google.com/document/d/10JT--pXWsfwC4QVu3XJpXGwcO08M6tnpkZ4PKdtCEWo .

tlodderstedt commented 4 months ago

I'm not a fan of PE. However, it is implemented now with OID4VP in a couple of places. So it is not a show stopper per se and people have invested money in it. That's why I'm hesitant to replace it.

I think most concerns can be addressed by defining a subset of PE. That's what we did in the HAIP spec. Please have a look

Having said that. I would be ok to add another way to request credentials (and deprecated PE) if the existing implementers and the SDOs that build standards on top of OID4VP w/ PE support that step.

Sakurann commented 4 months ago

Chair hat on, would suggest to proceed as following:

as a first step, agree on what are the problems with PE we want to address
than discuss how to solve those problems

Notes from Feb-22-2024 WG call,

Some requirements

need to work well with currently known credential formats, which include both JSON-based and CBOR-based formats
- could define credential format specific syntax, but sounded like there is some implementation feedback that ability to reuse the common syntax across credential formats is useful (@ve7jtb said it better)
need to clarify how verifier metadata negotiation happens: using traditional oauth style mechanisms, or using the query language
- seems to be the feedback that oauth style mechanism might not be the best fit for issuer-holder-verifier model and using query language has worked

Options in the solution space (does not have to be only one of these - doing 1 and 4, for example, seems to be an option)

define a (mandatory to implement?) minimum subset of PEv2.0
work with DIF on PEv3.0
define a credential format agnostic query syntax in OIDF. maybe something like the one proposed in https://github.com/openid/OpenID4VCI/pull/276
define a credential format specific query syntax in OIDF.
having the query language as an extension point

awoie commented 4 months ago

Chair hat on, would suggest to proceed as following:

define a (mandatory to implement?) minimum subset of PEv2.0

work with DIF on PEv3.0

define a credential format agnostic query syntax in OIDF. maybe something like the one proposed in Define claims display description and claims path query OpenID4VCI#276

define a credential format specific query syntax in OIDF.

In all cases, we need to define profiles for credential formats. Even if we had one query syntax that appears similar for all credential formats, it would have significantly different meanings per credential format, which is equivalent to defining a syntax per credential format. This applies to all four options.

Under this assumption:

Regarding option 1, I'm not convinced that having a profile for PE for each credential format is the best option. There is a risk that implementers will make mistakes, implement additional features not recommended, or become overwhelmed by the feature set that PEv2 offers. Note that they will need to refer to the PE spec anyway and find sections that apply.

Regarding option 2, it is similar to option 1.

Regarding option 3, I would be fine with that, but again, we would need to profile it anyway. I am uncertain if being credential format agnostic is a goal that can be achieved or one that we actually want to achieve. Reusing some components across credential formats does make sense, for example, how to encode constraints such as certain fields being required in the response. Note that https://github.com/openid/OpenID4VCI/pull/276 is actually more like option 4 since it uses the top-level namespace/credentialSubject object for mdocs/VCs, which is not common among the credential formats.

Regarding option 4, it is essentially the same as option 3.

Regarding options 3 and 4, I see them both as almost equivalent since you will need profiles anyway, and the interpretation of the query will differ depending on the format even it looks similar. I think the reuse of constraints (required, requiredIfPresent, intentToRetain, etc.) makes a lot of sense, but we should have the flexibility to cater to structural/encoding characteristics of formats, which has already been demonstrated in https://github.com/openid/OpenID4VCI/pull/276. I don't think the goal is to have an n-number of query syntaxes for one credential format. Ideally, it should be just one, defined aligned with the patterns the respective community has already used.

awoie commented 4 months ago

need to clarify how verifier metadata negotiation happens: using traditional oauth style mechanisms, or using the query language

seems to be the feedback that oauth style mechanism might not be the best fit for issuer-holder-verifier model and using query language has worked

I don't understand why verifier metadata negotiation is part of this discussion. There is a mechanism in place that works well with OID4VP. We even added a feature, client identifier scheme, to facilitate this negotiation. The method by which PE populates some of the supported curves is redundant and unnecessary. We even have a paragraph that states, "The Wallet MUST ignore any format property inside a presentation_definition object if that format was not included in the vp_formats property of the metadata." Therefore, format in PE is yet another potential source of developer confusion. Negotiation by taking into account wallet capabilities is another topic we try to address in #59.

David-Chadwick commented 4 months ago

@Sakurann You appear to missed out the option of having the query language as an extension point, which brings a lot of advantages, a main one being future proofing, another one being innovation. Some in the WG like this approach whilst others do not. An argument made against it was that it kills interop. But this argument is fallacious for several reasons. a) we already have a number of extension points e.g. the crypto suite. Is anyone arguing for replacing this with fixed a crypto suite? b) having a fixed query language can kill interop when more than one standard query language exists. We originally mandated DIF PEv1, now DIF PEv2 and some are thinking of moving to DIF PEv3. There is also the W3C presentation request draft that I referred to above, which many have implemented. Why should OID4VP try to pick the winner? Let implementations and the market decide and let the protocol cater for all.

Sakurann commented 4 months ago

@awoie verifier metadata negotiation is part of the discussion because there is no agreement on the statement The method by which PE populates some of the supported curves is redundant and unnecessary.

@David-Chadwick thank you! updated my original comment to add "having the query language as an extension point" as an option :)

awoie commented 4 months ago

@awoie verifier metadata negotiation is part of the discussion because there is no agreement on the statement The method by which PE populates some of the supported curves is redundant and unnecessary.

My point was that there is implicit agreement codified by normative language in the OID4VP spec: "The Wallet MUST ignore any format property inside a presentation_definition object if that format was not included in the vp_formats property of the metadata."

Sakurann commented 4 months ago

that language might be residual from before we negotiated with PE authors to add a top level format parameter in the PE. also that sentence only talks about the formats negotiation and not the rest of the metadata. so I think the large question is still in tact.

awoie commented 4 months ago

that language might be residual from before we negotiated with PE authors to add a top level format parameter in the PE. also that sentence only talks about the formats negotiation and not the rest of the metadata. so I think the large question is still in tact.

I don't understand what this means, according to github, format has been in PE for a long time (v2.0.0, > 2 year), even before the normative statement I was talking about was added (02/2023).

IMO, it is not a requirement of any query syntax to communicate any of these metadata. We have client metadata for this and changing this is would be a fundamental shift for OID4VP.

We have an issue to do real capability negotiation and using PE for this was never brought up nor included in any proposal. I believe nobody argued for putting other client metadata into the query syntax.

bc-pi commented 4 months ago

I'll go on record here as being not a fan of PE.

On the 22 Feb 24 call @Sakurann pointed out that this kind of "voting" alone on PE wasn't particularly helpful, which is fair. My concern with PE is somewhat more high-level. It promises to provide an awful lot of useful functionality but is extremely complex. This kind of standard often sounds/looks good on paper and even in narrowly constrained testing at first but the real cost of the complexity and sometimes inability to actually provide purported functionality doesn't show up until later and manifests in serious interoperability and security issues or lack of adoption/use. That's maybe not much more helpful that a "vote" but tried to convey some of my rationale.

nemqe commented 4 months ago

Our experience is that PE supports a lot of functionality that on paper sounds useful, but that in the end leads to a lot of complexity when trying to build systems that support many different flavors of credentials for reasons many mentioned above (encoding matters, storage method matters, structure matters, typing matters, features matter...) Main reasons that lead to this complexity were:

JSON Path richness of expression
Regex filters
and submission_requirements mixing and matching

We ignored Features for now

When only using absolute paths with non-regex filters we came to a more or less understandable implementation - but one that still involved having Intermediate Representation of credentials, and translating PE to another query language, and one that was still very complicated.

Chair hat on, would suggest to proceed as following:
1. define a (mandatory to implement?) minimum subset of PEv2.0

Reducing the set of PE would make sense at least to me, but I am not sure how that would be enforced and communicated. The feature set would need to be very clearly defined, and referenced in the base spec and not in the interop-profile.

2. work with DIF on PEv3.0

This could potentially give us an opportunity to define the base/core of the spec that we think is aligned with our requirements, so I think I prefer it more compared to suggestion #1.

3. define a credential format agnostic query syntax in OIDF. maybe something like the one proposed in [Define claims display description and claims path query OpenID4VCI#276](https://github.com/openid/OpenID4VCI/pull/276)
4. define a credential format specific query syntax in OIDF

Could make sense, but seems like a lot of work, and given that many have already implemented PE it might be easier to try to fix that one before designing yet another mini-language.

5. having the query language as an extension point

I think the spec should allow for multiple query syntax(or their versions), but that it should mandate one across all implementations. Example: VP 18 mandates PEv2, but it could allow you to use a specific one that might fit your system better. Mandated one(s) would ensure interoperability, while the optional ones would allow for experimentation, competition, and local optimizations. This could possibly also allow for easier switching from one version/flavor of query syntax to another. This would probably require some registry of known query syntaxes.

decentralgabe commented 4 months ago

I'd like to add support for @Sakurann's suggestions, namely

define a (mandatory to implement?) minimum subset of PEv2.0 work with DIF on PEv3.0

I'd take this a step further and suggest PE v3 is adopted as a work item of this group.

To paraphrase @selfissued I do not believe we should allow for multiple query languages if one will do. Standards make choices. Optionality is the enemy of successful interoperability.

Because I believe PE v3 will take a while, my immediate suggestion is to profile a subset of PE v2 and require it for all implementers of OID4VP.

Separately, I'd like to make a comment on the complexity of PE. As one of the authors/editors of PE v1 and a contributor to v2, we tried to represent the complexity present in the credentials space, which is itself highly complex. I could succeed in making an argument that OAuth, OIDC, and OID4VC are all really complex specs. This is not me suggesting that we increase that complexity, rather, it's an acknowledgment that our subject area is inherently complex and the reasoning "too complex let's exclude it" is misleading at best.

Many engineering teams have implemented PE successfully. I have personally done so multiple times. It's not an insurmountable task, albeit annoying at times.

This represents an opportunity for us as editors to find opportunities for simplicity that works for our use cases.

TomCJones commented 4 months ago

does anyone have an example of how to turn a PE request into a meaningful user consent form that can fine on a smart phone screen?

jogu commented 4 months ago

does anyone have an example of how to turn a PE request into a meaningful user consent form that can fine on a smart phone screen?

@TomCJones My understanding of current models is that the user consents to what information can be released that satisfies the PE request, not what information was requested by PE, so there is never a need to turn PE into a user consent screen.

So for example if the PE requests a proof of name from either a driving license or a passport, it does not seem important for the user to know that this was the request - only for the user to be asked (in the case where they only have a driving license on their phone and not a passport) whether they would like to release the name in their driving license credential to the verifier.

Can you explain a situation where it would instead be important for the user to understand the request the verifier made? At least in my example, it would seem considerably less user friendly to try to explain the request to the user.

TomCJones commented 4 months ago

I think you are telling me that the verifier gets to create a sql injection attack against the user, the user gets a list of all of the credentials and all of the data fields in those credentials as a consent request from what? one aggregator wallet, each wallet in turn (perhaps three of them) will they are standing in a line to get on an airplane or into a concert and makes an informed consent decision? All of the ideas about informed consent that i have been involved with the question in their mind is what the verifier is going to do with the data. I don't see where informed consent is addressed in any of this! It sounds to me like a direct violation of nearly every data protection law in existence.

jogu commented 4 months ago

Hi Tom

To complete my example, if the user has a driving license and a passport stored on their phone, and for the purposes of this example they are stored in different wallets, then my understanding of the flow might work (as is being defined in the WICG group):

Verifier tells the browser that it wants a verified name from a passport or driving license
The browser asks the installed wallets what credentials they have that meet that requirement (this matcher is sandboxed, so no information can be leaked to the wallet providers in this step)
Assuming more than one credential matches, the user is asked which credential they want to provide (or if they don't want to provide one)
If the user does want to share a credential, the wallet that holds that credential is then invoked to provide the credential and asks for any necessary user consent before returning the credential to the verifier.

This is fully informed consent. No one here has any interest in designing flows that fail to comply with data protection laws, that would be a fruitless endeavour.

TomCJones commented 4 months ago

i guess i don't see an example of the screens the user sees. Who's job is it to display the name of verifier and what they want to do with the data? For what purpose do they want the data? Is the verifier trustworthy? the device or each wallet. I can see some simple use cases going to six screens depending on the design.

jogu commented 4 months ago

Hi Tom. I think your original question about PE has been answered and we've now veered significantly away from the subject of this ticket. I think some of the WICG folks have demos of the flow, it might be worth asking there.

tplooker commented 4 months ago

To paraphrase @selfissued I do not believe we should allow for multiple query languages if one will do. Standards make choices. Optionality is the enemy of successful interoperability.

+1 here I dont believe we should allow multiple query languages, that would be a failure of the standard to foster interoperability.

Separately, I'd like to make a comment on the complexity of PE. As one of the authors/editors of PE v1 and a contributor to v2, we tried to represent the complexity present in the credentials space, which is itself highly complex. I could succeed in making an argument that OAuth, OIDC, and OID4VC are all really complex specs. This is not me suggesting that we increase that complexity, rather, it's an acknowledgment that our subject area is inherently complex and the reasoning "too complex let's exclude it" is misleading at best.

If you are referring to my last point, I don't believe that is the argument I'm making. I'm merely pointing out that there is a trade off, every feature a protocol or standard chooses to define has a cost associated to and that there needs to be a counter balance saying "do we really need this feature, is it worth the cost"? Having also participated somewhat in P.E I don't believe it got that right, the inclination was to say "yes" to every use case, rather then challenging which features were actually needed. I think its abundantly clear that especially in the context of OpenID4VP that P.E defines way more features then required for its application, evidence of which can been seen via the heavy profiling of P.E done in HAIP, ISO 18013-7 and OpenID4VP itself. That redundancy has a significant cost associated to it.

nklomp commented 4 months ago

Valid points are raised, but I also want to make a few points from our side:

Fully agreed on the complexity. It took us significant amount of engineering to support almost everything from the PE spec for v1 and v2. Having said that, PEv2 already introduced the concept of features. These are all optional, and thus if consensus is that a lot of these features are not used/needed for most use cases, creating a profile of PE makes sense; just leave out a lot of the features
PEX v2.1 will move to JSONPointer and make JSONPath optional, where V3 will remove JSONPath altogether, alleviating the security concerns (which are described in both PE and OID4VP, HAIP)
Unless I overlooked, nobody mentioned yet that PE is not only used in OID4VP, but also in other SSI contexts, like for instance Aries Present Proof v3. PE is agnostic to the transport protocol. Any party implementing support for other protocols next to OID4VP, thus would have to support another specification with very similar goals.
A lot of parties are supporting PE in their solutions. The OID4VC specs (no judgement) have already caused a lot of other new related specs to emerge for things that have been around in a slightly different form for quite some time. We are okay with that, but it does mean that vendors do need to add yet another way of doing similar things. The OID4VC specs are seeing constant change, which is totally understandable, at the same time the EU is moving fast and a lot of vendors already implemented PE in their solutions. IMO changing to a new credential query/request specification and then implementation will cost significant time and money on the spec level as well as on the implementation/adoption level. Almost everyone I talk to is already feeling the pain of the constant changes and amount of optionality in the OID4VC specs. Adding this big of change certainly won't help in that department.

I agree and am in favor of others pointing out:

it makes sense to profile to a subset of PE; the optional features would allow for that, as the PE core implementation would be considerably simpler.
Work together on PE v3. The whole point of PE is credential query/requests and is used in different contexts, so focus attention there

TomCJones commented 4 months ago

is there any evidence that normal human users or verifiers will accept VPs? Especially in view of the likelihood of multiple wallets responding to one request? I guess it's a foregone conclusion that it will be the browser or device that will function as the metawallet?

tplooker commented 3 months ago

In an effort to make some of my concerns more concrete, I've compiled 3 examples of P.E requests, I think they highlight very clearly why profiling P.E would be entirely insufficient to fix the issues here

Example 1

How should one process the following query?

{
    "id":"mDL-sample-req",
    "input_descriptors":[
        {
            "id":"org.iso.18013.5.1.mDL",
            "format":{
                "mso_mdoc":{
                    "alg":[
                        "EdDSA",
                        "ES256"
                    ]
                },
            },
            "constraints":{
                "limit_disclosure":"required",
                "fields":[
                    {
                        "path":[
                            "$['org.iso.18013.5.1']['driving_privileges']['codes']"
                        ],
                        "intent_to_retain":false
                    }
                ]
            }
        }
    ]
}

The ambiguity around this query is that selective disclosure in an mDoc can only be done to the "$['org.iso.18013.5.1']['driving_privileges']" level. So what should a wallet respond with here, the whole driving_privileges structure OR nothing?

A similar issue also exists when applied to an SD-JWT that encodes a nested object as a single disclosure rather as a series of nested disclosures all the way down to the leafs.

This in my opinion quite clearly highlights the issue with using either JSON Path OR JSON Pointer as the basis for a query syntax, it can't express the nuanced constraints of a credential format like SD-JWT or mDoc reliably.

Example 2

Similar question how should one process this query?

{
    "id":"mDL-sample-req",
    "input_descriptors":[
        {
            "id":"org.iso.18013.5.1.mDL",
            "format":{
                "mso_mdoc":{
                    "alg":[
                        "EdDSA",
                        "ES256"
                    ]
                },
            },
            "constraints":{
                // Note no limit disclosure
                "fields":[
                    {
                        "path":[
                            "$['org.iso.18013.5.1']['driving_privileges']"
                        ],
                        "intent_to_retain":false
                    }
                ]
            }
        }
    ]
}

Without limit_disclosure in the query how should a wallet receiving this interpret the query? Should they return all the claims or just the "fields" in the request? If they return only the fields in the request does this undermine the point of limit_disclosure in the first place? The point I'm making with this example is P.E has bunch of features that make sense in isolation but not when composed together, whether specific claims can be requested is a property of the credential format and shouldn't be independently communicated/negotiated in the request.

Example 3

As a verifier if I were trying to determine whether the holder has a US drivers license, e.g by determining whether it contains a real ID claim without them knowing, I could use the following query to disguise my request

{
    "id":"mDL-sample-req",
    "input_descriptors":[
        {
            "id":"org.iso.18013.5.1.mDL",
            "format":{
                "mso_mdoc":{
                    "alg":[
                        "EdDSA",
                        "ES256"
                    ]
                },
            },
            "constraints":{
                // Note no limit disclosure
                "fields":[
                    {
                        "path":[
                            "$['org.iso.18013.aamva']['real_id']",
                            "$['org.iso.18013.5.1']['family_name']"
                        ],
                        "name": "Family name",
                        "purpose": "Requesting family name",
                        "intent_to_retain":false
                    }
                ]
            }
        }
    ]
}

The problem here is two fold

1) path allowing multiple expressions is meant to be a feature to create equivalent expressions across formats, but there is nothing stopping one from using it to query different attributes within the same credential to disguise the intent of a request. 2) name creates a second source of truth for what is being requested meaning the verifier can use it to further disguise the intent of a request to a wallet.

Based on the logical processing of P.E for the above if the real_id attribute was present on the matched credential it would be returned and the holder would likely have given consent based on the "name" thinking they were releasing their "Family name". Secondly in the case the wallet actually didn't have the real_id attribute on the drivers license, the fact that the family name is returned here, proves the real_id attribute doesn't exist, leaking this to the relying party.

Both of these cases above are concerning from a data privacy perspective and can't simply be addressed by profiling P.E they are fundamentally designed into it.

jogu commented 3 months ago

@tplooker can you explain how these situations can be resolved please - e.g. are you suggesting defining a query language that simply doesn't allow the queries in your examples to be made?

tplooker commented 3 months ago

@tplooker can you explain how these situations can be resolved please - e.g. are you suggesting defining a query language that simply doesn't allow the queries in your examples to be made?

In short yes I believe a proposal like what is given in https://docs.google.com/document/d/10JT--pXWsfwC4QVu3XJpXGwcO08M6tnpkZ4PKdtCEWo/edit#heading=h.7igj7m3na8ru avoids these issues by

Not using a generalised DSL syntax like JSON path/pointer which has no awareness of the data structure it is referring to
Assumes the queries are in the context of a format, so features that cut across formats in awkward ways like limit_disclosure are no longer an issue, because either the credential format supports selective disclosure or it does not.
Doesn't allow multiple path expressions to request the same attribute because the query itself is already localised to the credential format and type.

nklomp commented 3 months ago

If you ask me the example given have pretty clear outcomes.

Example 1: Since in your example limit_disclosure is required to begin with, it always has to be restrained to the actual fields requested, not the entire surrounding object all of a sudden. Secondly the example apparently is plain wrong in the scope of MDLs, so IMO that is on the party creating the definition. Just like I can create a definition that does limit disclosure, but then take a root path somewhere, inadvertently exposing everything. That is just the nature of any query format. Lastly I would also expect a PE implementation that handles MDL/Mdocs know about any additional constraint a data format/spec has on top of the generic query format. So it could simply raise errors for these types of examples as well.

Example 2: You are just expressing that the field should be present, given that is the default when no optional property is given. Since you are not using any constraints, and no limit disclosures, it means that all fields should be returned, provided that the only field you mention is present. If not nothing should be returned.

Example 3: Imo these types of problems are more generic in nature and are likely to also exist with other credential query formats. Whenever a RP wants to do harm they can. Simply having human readable descriptions, intents or purposes to make it easy for any human to understand what data is requested, is always an option that can be used to manipulate. Similar like you can also simply request too much information as a RP in a lot of cases. Someone wants something from the RP, so they are likely to agree with what is being shared anyway. IMO a wallet should show what data is being disclosed, and for the most sensitive data it makes sense that the UX is in such a way that the user at least somehow can have a notion of what is going on.

tplooker commented 3 months ago

Since in your example limit_disclosure is required to begin with, it always has to be restrained to the actual fields requested, not an entire object all of a sudden. Secondly the example apparently is plain wrong in the scope of MDLs, so IMO that is on the party creating the definition. Just like I can create a definition that does limit disclosure, but then take a root path somewhere, inadvertently exposing everything. That is just the nature of any query format.

Im not sure I understand the point you are making or that I agree that all credential query syntaxes suffer this issue, the problem is with JSON path/pointer being able to refer to any part of a credentials structure, even parts we dont want them to, vs a credential query syntax like the one I linked above which would make it totally illegal to request at a level or granularity which is going to lead to data privacy issues.

You are just expressing that the field should be present, given that is the default when no optional property is given. Since you are not using any constraints, and no limit disclosures, it means that all fields should be returned, provided that the only field you mention is present. If not nothing should be returned.

That is precisely the behaviour I was concerned about, as a relying party I have to remember to include limit disclosures in my request to only get the stuff I have listed back. Even though I have crafted a request for a specific set of attributes from a credential format that supports selective disclosure, that is a recipe for confusion.

Imo these types of problems are more generic in nature and are likely to also exist with other credential query formats. Whenever a RP wants to do harm they can. Simply having human readable descriptions, intents or purposes to make it easy for any human to understand what data is requested, is always an option that can be used to manipulate. Similar like you can also simply request too much information as a RP in a lot of cases. Someone wants something from the RP, so they are likely to agree with what is being shared anyway. IMO a wallet should show what data is being disclosed, and for the most sensitive data it makes sense that the UX is in such a way that the user at least somehow can have a notion of what is going on.

I strongly disagree, this isn't about a malicious RP, of course they are going to exist, this is about the fact that an RP can exploit features in the query syntax to fool a wallet and user into revealing more information then that was intended without it being detected. In short it's a security issue with the query syntax we must fix.

nklomp commented 3 months ago

Im not sure I understand the point you are making or that I agree that all credential query syntaxes suffer this issue, the problem is with JSON path/pointer being able to refer to any part of a credentials structure, even parts we dont want them to, vs a credential query syntax like the one I linked above which would make it totally illegal to request at a level or granularity which is going to lead to data privacy issues.

As mentioned, I would expect any implementation that handles Mdocs to protect against that, no matter the query language being used. That is simply the distinction between having a generic query language applicable to more than one format, versus having query languages per format if you ask me. It has to be solved somewhere, but in all cases any implementation should protect against invalid queries to begin with.

That is precisely the behaviour I was concerned about, as a relying party I have to remember to include limit disclosures in my request to only get the stuff I have listed back. Even though I have crafted a request for a specific set of attributes from a credential format that supports selective disclosure, that is a recipe for confusion.

Why, if you ask me the spec is pretty clear what should happen in a generic fashion for both example 1 and 2. The fact you are only listing one field is not the same as selective disclosure. Also the example is pretty contrived, because normally you would at least include something of a constraint.

I strongly disagree, this isn't about a malicious RP, of course they are going to exist, this is about the fact that an RP can exploit features in the query syntax to fool a wallet and user into revealing more information that was intended without it being detected.

In my world that is the definition of a malicious RP

nklomp commented 3 months ago

IMO your remark about having the option of multiple paths is a pretty valid one. In most cases having multiple paths is not really needed, given a RP typically knows exactly what credential types it is expecting. It should be an option/feature if you want to be more generic and retrieve similar information from multiple locations. Especially since you can also do grouping anyway, and thus cater to multiple types each having a single path with current PE

Similarly I can understand that you want to have a more concise query for doing selective disclosure, because you feel that would be more natural.

None of the above however are things that could not be considered in a next version of PE.

tplooker commented 3 months ago

As mentioned, I would expect any implementation that handles Mdocs to protect against that, no matter the query language being used. That is simply the distinction between having a generic query language applicable to more than one format, versus having query languages per format if you ask me. It has to be solved somewhere, but in all cases any implementation should protect against invalid queries to begin with.

In that case we have to add a set of constraints on top of P.E, to limit the usages of P.E that makes no sense and hope wallets and RP's actually perform these checks reliably. When you take a step back, why add a format specific constraints layer on top of a generic query syntax, that defeats the entire purpose IMO. Why not just have a format specific query syntax to begin with?

Why, if you ask me the spec is pretty clear what should happen in a generic fashion for both example 1 and 2. The fact you are only listing one field is not the same as selective disclosure. Also the example is pretty contrived, because normally you would at least include something of a constraint.

Thats the disconnect though, I think many developers will expect if they say "I want your first name from your drivers license" that they dont also have to say separately "... and nothing else please", its inherent in the first part of my request, that is a better way to manage over information disclosure risk.

IMO your remark about having the option of multiple paths is a pretty valid one.

Right but the problem is, if you remove the multi-path feature from P.E then you can no longer use it as credential format agnostic query syntax layer because you can't express requesting the same attribute across credential formats, without seperate queries. Thats why it's a fundamental design issue.

kimdhamilton commented 3 months ago

PE will have a 2.1 release coming in a few weeks where we addressed many concerns discussed at IIW. While we can’t make breaking changes to get it to the streamlined 3.0 we want, we've addressed as many issues as we can. We expect to fast-follow with a 3.0 and would love to work with you on it.

We’ll make sure the broader community has an adequate window to provide feedback on 2.1. Also, feel free to chime in on the Presentation Exchange issues. The 2.1 milestone tracks our remaining planned work.

nklomp commented 3 months ago

In that case we have to add a set of constraints on top of P.E, to limit the usages of P.E that makes no sense and hope wallets and RP's actually perform these checks reliably. When you take a step back, why add a format specific constraints layer on top of a generic query syntax, that defeats the entire purpose IMO. Why not just have a format specific query syntax to begin with?

Even if you have credential specific query languages you need to have similar logic in place anyway. The benefit is that you can get it from one spec for that format, instead of having to look at 2 specs.

On the why not, several people already have expressed why not. A lot of parties are using PE with success. Parties have invested money in supporting it. PE is not only used in the scope of OID4VP as it is protocol agnostic.

Thats the disconnect though, I think many developers will expect if they say "I want your first name from your drivers license" that they dont also have to say separately "... and nothing else please", its inherent in the first part of my request, that is a better way to manage over disclosure risk.

But that is an assumption, because you are looking at it from a selective disclosure by default perspective it seems. I don't disagree with that, but PE has clearly been created the other way around. Of course that could be made more clear. But IMO upon reading the specs carefully, the spec does answer the outcome of both example 1 and 2. IMO anyone implementing definitions at least should know what they are writing. Yes a format specific query language can make that more easy for a specific format; but it is likely you would also need to learn them for another format. And people will always be making mistakes ;)

Right but the problem is, if you remove the multi-path feature from P.E then you can no longer use it as credential format agnostic query syntax layer because you can't express requesting the same attribute across credential formats, without seperate queries. Thats why it a fundamental design issue.

Yes, hence why I think it should not have been a default. I guess time will tell how much need the market has for such a feature. In the end it could be solved with multiple queries indeed. I am also interested how this problem would be solved with your proposal. Because if you feel that being able to request the same attribute across formats is desirable, somehow that needs to be provided.

kimdhamilton commented 3 months ago

I caught up on the comments. Great input, everyone. I'm strongly supportive of @Sakurann's and @decentralgabe's suggestions, and we can make that happen.

Note that our 2.1 is strongly informed by IIW feedback and other complexity concerns raised over the past year. Many of us working on PE agree with those sentiments, and therefore we were very eager to streamline it. Also, we're looking forward to further streamlining with a 3.0 release.

tplooker commented 3 months ago

Even if you have credential specific query languages you need to have similar logic in place anyway. The benefit is that you can get it from one spec for that format, instead of having to look at 2 specs.

I think the benefit is beyond that, credential format specific syntaxes enable you to make many of the expressions that are problematic with JSON path/pointer impossible to construct at a more general syntax level, so you don't need special rules you have to apply above the syntax. For example with my first example I provided above, its impossible to request the "code" within the driving privileges structure of an mDL in the mDoc specific query syntax proposed in this doc because it only allows you to request claims at the namespace level, no deeper. Similar optimisations I believe can be made for other formats to make more things illegal at the syntax level which would be a huge simplifications for implementations.

danielfett commented 3 months ago

Reading the comments on this issue, I get the feeling that we're talking too much about solutions and not enough about what the actual requirements are. It is quite easy to come up with a query language, but it is hard to design a good one. The key to that, of course, is to understand what features we need and don't need.

In light of the recent issue in VCI, I think we should

expand the discussion a bit and find out where exactly the query language would be used and for what purposes (e.g., describing a credential in the issuer metadata is different from asking for a credential with specific properties, but VCI uses the same syntax for both at the moment),
formalize the set of requirements (using Mike's list and/or the first two pages of Tobias' document as a starting point) and agree on it,
and only then figure out which of the options listed above best serves the purpose.

danielfett commented 3 months ago

And jumping into solution space again here for a moment, I haven't seen GraphQL mentioned. It has some pretty nice concepts as well.

awoie commented 3 months ago

And jumping into solution space again here for a moment, I haven't seen GraphQL mentioned. It has some pretty nice concepts as well.

GraphQL is almost as bad if not worse than JSONPath and JSONPointer for our purpose (security, complexity etc.).

danielfett commented 3 months ago

And jumping into solution space again here for a moment, I haven't seen GraphQL mentioned. It has some pretty nice concepts as well.

GraphQL is almost as bad if not worse than JSONPath and JSONPointer for our purpose (security, complexity etc.).

Any specific concerns with the security of GraphQL? I thought it was designed to work with potentially adversarial queries.

awoie commented 3 months ago

Regarding the PEv2 profile approach, while it's feasible to maintain the same optical syntax across various credential formats, the semantics of parameters like path differ significantly, along with the associated processing rules.

A major drawback of adhering to the PEv2 profile approach is that it restricts us to a specific query vocabulary not designed to cater to the needs of more than 20 credential formats. If a community defining a credential format has unique requirements, they might struggle to adapt the existing PE vocabulary, potentially resulting in awkward implementations and profile specifications of PEv2.

It's worth noting that ISO WG10, the group responsible for the ISO mDL specification, expressed their intention to actively seek input to incorporate their specific query syntax requirements. The recent proposed amendments of ISO 18013-5 to the CBOR version of the mdoc request do not align well with the current PE vocabulary.

Persisting with the PEv2 profile approach could unnecessarily constrain us and other communities, forcing them to retrofit their needs into the PE framework. This approach is not sustainable for future developments and is likely to cause implementation challenges.

Furthermore, this could lead to potentially dangerous situations where the same PE vocabulary yields different semantic outcomes across credential formats, potentially causing confusion among developers. In my opinion, this variability undermines clarity, effectiveness and robustness.

decentralgabe commented 3 months ago

A middle ground approach may be to offer PEv2 as a "default" query language and allow overriding that default with format specific query languages.

My strong preference is to just use a single query language, but if it truly is not fit for purpose, and causes more problems than its solving, then we should embrace a multi-format multi-syntax world.

alenhorvat commented 3 months ago

Will OIDF carry the cost of modifying all the existing implementations?

alenhorvat commented 3 months ago

And jumping into solution space again here for a moment, I haven't seen GraphQL mentioned. It has some pretty nice concepts as well.

GraphQL is almost as bad if not worse than JSONPath and JSONPointer for our purpose (security, complexity etc.).

Any specific concerns with the security of GraphQL? I thought it was designed to work with potentially adversarial queries.

Devs at EBSI evaluated graphql 3 years ago and it turned out even more complicated to use. They advised against it (against replacing PE with GraphQL).

kimdhamilton commented 3 months ago

Because I believe PE v3 will take a while, my immediate suggestion is to profile a subset of PE v2 and require it for all implementers of OID4VP.

@decentralgabe one thing I missed in your response, which I should clarify. The proposed profile corresponds to the base set (no "Features") of PE v2 / v2.1.

Many people seem to have missed this change, which has been true as of PEv2. See details

That said, your point remains; this spec should not block on PEv3.

tplooker commented 3 months ago

@alenhorvat could you review https://docs.google.com/document/d/10JT--pXWsfwC4QVu3XJpXGwcO08M6tnpkZ4PKdtCEWo/edit#heading=h.7igj7m3na8ru as a concrete alternative proposal to P.E?

awoie commented 3 months ago

@decentralgabe @csuwildcat We updated the credential-format specific query doc to include a value parameter for constraints, to show how it would be possible with the proposed syntax to ask for credentials with claims that have a specific value. Document is here.

openid / OpenID4VP

Opportunities to improve the credential query/request syntax #112