Closed nissimsan closed 1 year ago
credentialSubject array prevents mapping to JWT and introduces errors in many VC libraries.
That seems to be more of an issue with the limitations of VC JWT than the VC specification itself.
I've warned of these limitations in the past, as have others... there are a number of things that VC JWT does not allow one to do easily (semantic compression, nested proofs, multiple proofs, weird array semantics like the one this issue is raising). More issues are documented here:
https://w3c.github.io/vc-imp-guide/#benefits-of-json-ld-and-ld-proofs
Even though the data model does/should allow it, here's a pragmatic suggestion on circumventing the issue -- https://github.com/w3c-ccg/traceability-vocab/pull/397
The solution is a result of the proof mechanism driving the data model; it's the tail wagging the dog. The approach is backwards and is telling of something deeply wrong (from an architectural standpoint) of the proof format. If you have to change your core data model because of limitations in the proof format, that is a good indicator that your proof format is badly designed. VC JWT proofs take a "center of the world" philosophy, that is, developers should design their data formats and protocols to fit within the limitations of VC JWTs. The Data Integrity specification takes the "annotation" philosophy, where the developer determines the data model they want to use and then the Data Integrity specification annotates that data model with a proof.
All that to say, what is the ask in this issue? Is this a request to change the VC Data Model specification? Or is this a request to document one way of getting around the VC JWT limitation when issuing a VC that covers multiple credential subjects?
I am in favor of eliminating "multiple credential subjects" as an option from the spec... I have not observed any real world use cases for this, and it seems that it is harmful complexity.
TLDR:
credentialSubject
MUST be an object or string.
The real-world example we discussed while building the VC Data Model was a marriage certificate, with a credentialSubject array of two (or more) spouses.
Searching within the history of our work is an important step when considering changing something we (the VC Data Model WG) finalized and pushed through to TR.
I have not observed any real world use cases for this, and it seems that it is harmful complexity.
Digital Bazaar has multiple real world use cases for this (packaging multiple related credentials together), has shipped the solution to customers, and would object to such a change.
@TallTed @msporny please provide actual examples, not hypotheticals, happy to discuss on a call.
For example, here are the credential formats we recently updated to avoid interoperability failures with JWT: https://github.com/w3c-ccg/traceability-vocab/pull/397
I consider anything that makes the "mapping" to VCs critical, complexity at this layer will lead to implementation bugs, and verifier overhead... limiting the structure of what a "credential" object is, makes mapping safer.
Just because it's easy to imagine JSON does not mean its a good idea to support arbitrary unbounded structures.
Digital Bazaar has multiple real world use cases for this (packaging multiple related credentials together), has shipped the solution to customers, and would object to such a change.
I don't think packaging multiple related credentials together would require the usage of an array for the credentialSubject
. In fact, I'd think that would be the suggested alternative method so that only an object or string is needed for that property.
With regards to whether or not it should be allowed the question in my mind is less so about whether there are valid use cases for it and more so whether those use cases could be modeled with multiple credentials instead which is often times a simpler and more discrete solution to the problem in my opinion and is a much more widely used pattern. The application logic of allowing arrays, objects, and strings is going to open up a lot more code paths which are often times security critical code paths needed to be checked during verification of the credentials and verification of verifiable presentations derived from the credential. I'd be a +1 to forbidding the codepaths that rely upon the credentialSubject
being an array and keeping it to only object or string (I'd like to see string option removed as well personally).
Supporting multiple subjects seems like a failure to "Keep simple things simple". We should consider removing this unnecessary complexity from the V2 spec.
being an array and keeping it to only object or string (I'd like to see string option removed as well personally).
+1 to making credentialSubject
a non empty object, and not allowing any of the following:
credentialSubject: null,
credentialSubject: [],
credentialSubject: {},
credentialSubject: '',
IMO it's ridiculous to limit expressive functionality of a VC because one of the integrity formats isn't flexible enough. The business/functional case is what we should solve for, not shoving the world into the JWT (or any) specification.
Credential Subjects as arrays make sense, for the marriage certificate scenario, parent/child, and a handful of other real-world credentialing use cases.
Instead, I'd ask - what can we do to make this work while using JWTs? Here are some top-of-mind workarounds:
subs
that is a JSON arraysub
property with a non-normative usage (e.g. "sub": "did:abcde:1, did:abcde:2"
or similar)To summarize:
Answering my own question above:
The following non normative example exist:
https://www.w3.org/TR/vc-data-model/#example-specifying-multiple-subjects-in-a-verifiable-credential
Here is that example slightly modified to show a blank node for one of the subjects:
<did:example:ebfeb1f712ebc6f1c276e12ec21> <http://schema.org/name> "Jayden Doe"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML> .
<did:example:ebfeb1f712ebc6f1c276e12ec21> <http://schema.org/spouse> "did:example:c276e12ec21ebfeb1f712ebc6f1" .
<http://example.edu/credentials/3732> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.org/examples#RelationshipCredential> .
<http://example.edu/credentials/3732> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://www.w3.org/2018/credentials#VerifiableCredential> .
<http://example.edu/credentials/3732> <https://www.w3.org/2018/credentials#credentialSubject> <did:example:ebfeb1f712ebc6f1c276e12ec21> .
<http://example.edu/credentials/3732> <https://www.w3.org/2018/credentials#credentialSubject> _:c14n0 .
<http://example.edu/credentials/3732> <https://www.w3.org/2018/credentials#issuanceDate> "2010-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<http://example.edu/credentials/3732> <https://www.w3.org/2018/credentials#issuer> <https://example.com/issuer/123> .
_:c14n0 <http://schema.org/name> "Morgan Doe"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML> .
_:c14n0 <http://schema.org/spouse> "did:example:ebfeb1f712ebc6f1c276e12ec21" .
This is for a credential
... and we must be able to agree that this is a "spec legal / normatively valid" credential
before contemplating adding proofs.
Clearly, I can sign this with a Data Integrity Proof, or with a normal vanilla JWS... But I cannot map this to a "VC-JWT"... because of the way this section of the spec was written:
https://www.w3.org/TR/vc-data-model/#jwt-encoding
AFAIK, this means that this credential
can ONLY be normatively represented as a Data Integrity proof format Verifiable Credential
.
As @decentralgabe noted, we could adjust the JWT production rules to support multiple subjects.... we could say a VCDM 2.0 JWS-VC is just a JWS, and does not require sub
or iss
or typ
JWT.
We could also just delete the sentence that says:
If sub is present, the value MUST be used to set the value of the id property of credentialSubject of the new credential JSON object.
since IIRC, sub
is optional in JWTs.
being an array and keeping it to only object or string (I'd like to see string option removed as well personally).
+1 to making credentialSubject a non empty object, and not allowing any of the following:
I'd be +1 to this as well. I think the added complexity of handling three formats is hard to justify without a compelling use case. If we want to express a relationship between multiple subjects it seems ambiguous what the relationship is between an array of items is.
Digital Bazaar has multiple real world use cases for this (packaging multiple related credentials together), has shipped the solution to customers, and would object to such a change.
@msporny I would be interested to hear more on why you chose to bundle credentials together rather than have separate credentials that can be presented together if you are able to share that.
I'd allow an object or string, but if it ends up just allowing objects I'm not crying tears.
Im also a +1 to simplifying the value types supported by credentialSubject.
The cited usecase of the marriage certificate hasn't swayed me to believe that we should support an array of credential subjects. For instance if you did issue a marriage certificate with multiple subjects how does a presentation work? Do all subjects have to prove possession of the associated cryptographic material? Or just one? Where is this behaviour defined? Its complexity like this that we now have to deal with, if the credential subject is allowed to be an array. This also makes things like selective disclosure super difficult to do in a simple way. There are also otherways to model a credential that describes multiple people without having everyone reflected formally as a "subject" of the credential.
We need to carefully trade off the complexity any feature introduces IMO this features complexity outweighs its value.
For instance if you did issue a marriage certificate with multiple subjects how does a presentation work?
This is a separate, though important problem. Even if there are no good solutions today, I don't believe that means we can't come up with a reasonable multi-party presentation scheme.
Do all subjects have to prove possession of the associated cryptographic material? Or just one? Where is this behaviour defined?
Presentation Exchange can already handle this complexity. It should be up to the verifier to determine what combination(s) they are willing to accept.
This also makes things like selective disclosure super difficult to do in a simple way.
May be true, still up to the verifier. Maybe they only care about non-selective disclosure cases?
There are also otherways to model a credential that describes multiple people without having everyone reflected formally as a "subject" of the credential.
Yes, agreed; though there are cases where having multiple parties on a credential makes sense. Marriage is one example, business co-founders is another. I imagine there are half a dozen other use cases as well.
I believe we should offer flexibility. I also believe this discussion should decouple integrity mechanisms from functional requirements. If we believe multi-party credentials are legitimate that's a completely separate issue than making them work with JWTs.
Several realizations of credentials (e.g. leveraging anonymizing techniques like link secrets) will often only have a way for a single subject to prove knowledge/possession.
The non-normative example of a "bearer" credential is effectively a credential without any means for cryptographically verifiable presentation by the subject.
It really isn't as simple as spaces-vs-tabs distinction in credential formats.
The example of a marriage certificate is a set of claims about two subjects, and the use of DIDs is a specific choice to allow distinct cryptographic proofs from each of the mentioned parties. There are trade-offs around privacy and expressiveness to such a multi-subject credential that may be fine for some scenarios and unacceptable for others.
My concern is how we would walk someone through such trade-offs, and how to educate the reader when the n-dimensional matrix of technology choices are incompatible with what the VCDM says they should be able to express as a single credential.
The Verifiable Credential (abstract) Data Model 1.0 was explicitly built to enable Verification that an Issuer made the Assertions/Claims found in a given Credential/Presentation, which Assertions/Claims may be about one or more (Credential/Presentation) Subjects.
The Verifiable Credential (abstract) Data Model 1.0 was NOT explicitly built to enable Verification that a Holder/Presenter is the Subject of a given Credential/Presentation, though this is generally achievable with related technologies.
We spent many months specifying lossless translations between the abstract Data Model and various concrete expressions of that Data Model, including JWTs, though the WG lacked significant expertise in JWTs and (after coming very close to dropping JWT translation entirely!) relied on a few group members and subsequent community review and implementation experience for this area. It is now clear that we do not have lossless translation to and from JWTs when the abstract CredentialSubject is an array. I fear there may be other issues like this that won't surface until/unless someone tries to exercise a specific VCDM feature.
I do not currently believe that revising these translations should require changes to the abstract Data Model. I do believe that JWT experts should now present alternative(s) translation(s) for an upcoming VCDM (1.1 or 2.0) revision.
@decentralgabe said:
come up with a reasonable multi-party presentation scheme
This line of reasoning probably walks us way to close to a normative definition of a protocol in order to do this in an interoperable way between the many parties to generate the presentation request. We would require "a set of rules that control the way data is sent between computers" in order to do this which is one of the 4 definition of protocol defined by Oxford. I don't think we can do that by definition of scope of the charter.
@kdenhartog I didn't consider the charter, so thanks for bringing this up. I did not mean to imply that the protocol should be defined in this group. The data model can exist independent of such a protocol, as I understand the charter is primarily around the data model.
Yeah I agree we could define that elsewhere. My line of thinking was that if we can't normatively define the protocol here then it's best not to leave this as an undefined behavior up to others to define properly because in my experience that usually means it becomes an extension point where many people define it in many different ways. That's also part of the reason I see it being worth eliminating this code path. By doing so we also eliminate implementers having to account for the fact that others may define many different ways to coordinate or validate this behavior and that's going to make it more complicated to write implementations around this feature. If we could define protocols here I'd think there would be a stronger argument for allowing this feature, but in this case the complexity outweighs the benefits in my eyes when considering the fact that use cases can re-model the data to use multiple credentials so that the user is left unaffected by this consideration.
I don't agree, and that isn't precedent. We list credentials without a way to request or receive them. We describe presentations without a way to get requests for presentations, or how to transmit them.
There is precedent for protocols outside this WG defining this functionality. Presentation Exchange is already flexible enough to handle the use case.
The reality is that there are several decisions (including those as part of privacy considerations) which will prevent a full fidelity expression of the data model being exchanged between parties.
Likewise, Presentation Exchange shows that the presentation data model may need extension for usage within an abstractly defined exchange protocol.
If the working group can agree to these two points, we might even be able to stop arguing about the superiority of their favorite data expressions, and rather admit that (as always) there are trade-offs.
Or at least, argue a little less.
Here's a real-life example: a Universal Business Language 2.2 (UBL) schema for an Invoice: https://github.com/mwherman2000/VCTPSPrototypes/blob/main/VCTPSPrototype6/VCTPS.UBL22.Invoice/UBL22CommonAggregateComponents.tsl#L330-L335
A UBL Invoice has between 2 and 6 parties ...what some people would think of as credentialsubjects.
My position is that there is only one credential subject ...and that's the invoice itself ...identified by its credentialsubject.id (aka invoice number/identifier).
The 2-6 parties that are included and/or referenced by an Invoice are simply claims ...components of the data that make up an Invoice.
Any other concrete examples?
@mwherman2000 extending your invoice example to the marriage certificate, then the subjectID would be that of the marriage VC and the two spouses would be a set of claims on the certificate. Furthermore, the marriage VC is not the actual marriage certificate - but only a temporary time limited virtual copy of it, with an expiry date. So different marriage certificate VCs could be given to the different spouses, where the subjectID represents the public key of the specific spouse that is holding it. And this is perfectly easy to map to JWT representations of it
I think the complexity of multiple credential subjects might be somewhat mitigated if we successfully address credential holders as a separate entity.
On Mon, May 16, 2022, 12:01 David Chadwick @.***> wrote:
@mwherman2000 https://github.com/mwherman2000 extending your invoice example to the marriage certificate, then the subjectID would be that of the marriage VC and the two spouses would be a set of claims on the certificate. Furthermore, the marriage VC is not the actual marriage certificate - but only a temporary time limited virtual copy of it, with an expiry date. So different marriage certificate VCs could be given to the different spouses, where the subjectID represents the public key of the specific spouse that is holding it. And this is perfectly easy to map to JWT representations of it
— Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/875#issuecomment-1127972306, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPFKP5FEWIHZINQQZHS6PLVKKEPHANCNFSM5VBKTY6Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
The issue was discussed in a meeting on 2022-08-03
blocked by #947
Suggest closing, this has been addressed in both securing specs.
The issue was discussed in a meeting on 2023-06-06
No objections raised since marked pending close
, closing.
credentialSubject array prevents mapping to JWT and introduces errors in many VC libraries. Even though the data model does/should allow it, here's a pragmatic suggestion on circumventing the issue: https://github.com/w3c-ccg/traceability-vocab/pull/397