w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
287 stars 104 forks source link

Representing Mutli Subject Credentials in the VCDM #931

Closed decentralgabe closed 1 year ago

decentralgabe commented 1 year ago

As per the discussion at TPAC on September 16:

Overview

There is already an example in the spec that calls this out here: https://www.w3.org/TR/vc-data-model/#example-specifying-multiple-subjects-in-a-verifiable-credential

This issue mostly lays out information. I'm interested in what the community thinks, and moving towards a concrete proposal (with suggestions from you).

Related To

Prior Art

Use Cases

What Would It Look Like?

Screen Shot 2022-09-16 at 10 37 09 AM

Questions

TallTed commented 1 year ago

Questions

  • Isn’t subject vs holder already confusion enough?

There should be no confusion. I continue to be baffled by the assertion that the subject is (or is not) inherently and implicitly the holder. The subject(s) is(are) the entity/ies about which the VC contains assertions. The holder(s) is(are) the entity/ies who hold the bytes that comprise the VC, including those statements, the relevant crypto to confirm that they have not been changed since the VC was issued, and the relevant crypto to confirm that the issuer is as identified (because the issuer MUST indeed be identified) within the VC.

  • Should credentials with multiple subjects even be allowed?

Why not? Remember, part of the basic tenets of VCs are based on the basic tenets of the WWW -- i.e., any entity can say anything about any entity. Just as a book may contain statements about multiple subjects, a VC may contain statements about multiple subjects.

  • What alternatives are there?

Well, there are collective nouns (a/k/a entities), which names (a/k/a URIs) might be minted at the time of VC construction, and defined at that time to be made up of some number of individual nouns (a/k/a entities). I anticipate there would be some statements about the individuals (and not about the collective!) which would have to be put in a second (and more) VC which would have to be distributed alongside the "collective noun description VC", and all these VCs would need to cross reference each other. I am not in favor of forcing this option.

There are undoubtedly other possibilities.

I cannot speak well to this specific, but I believe that VCDMv1 did this job reasonably well. If there are specific flaws, they will be much more addressable if they can first be well described!

decentralgabe commented 1 year ago

Thanks, @TallTed I agree with 99% of what you wrote. I added those questions to appear neutral (though I'm really not 😁)!

The subject/holder semantic confusion is something we must clear up rather soon. Your conception was mine, until learning of other's perspectives. Such room for other interpretations to creep in is an indication that the spec text is not as precise and unambiguous as it should be.

One point that @smithsamuelm raised at TPAC (though relating to multiple issuers, not multiple subjects) was that a single DID can be used to represent multiple parties, if those parties agree to a representation scheme under a single DID (e.g. a multisig, multiple controllers, or something else). I believe this is a possible method of cleverly allowing multiple subjects (or issuers!) in a credential; however I am worried that this approach has two clear downsides:

  1. It is not immediately clear in reading a credential who the subject(s) are. Many DIDs may have related keys/DIDs in them controlled by a single entity. It would be exceedingly difficult to determine cases of multiple subjects, or subjects who just have a bunch of identities.
  2. DIDs with multiple keys/related IDs can change. This can harm the verifiability/integrity of a credential. For example, if I issued a credential to a single DID (did:example:alices-multisig) comprised of two parties (lets say did:example:alice and did:example:bob) and 6 months post-issuance Alice reforms her multisig to include new parties and remove Bob lets say did:example:alices-multisig now contains did:example:alice, did:example:charlie, and did:example:dave) is the credential to still be seen as valid?

Moving past that example. I'd like to gain a sense of folk's appetite for firming up the existing language around multiple subjects and introducing normative statements for the representation of multiple subjects in a single credential, expanding on the existing spec text and example here.

SmithSamuelM commented 1 year ago

@decentralgabe One of the warts your are exposing has often be referred to as the cheap pseudonymity problem. If a DID as cryptonym (cryptographic pseudonym) may be cheaply created and the value of the reputation of that DID, its brand value, the connections and relationships, the liabilities, accruals, indeed the value of maintaining control of that DID by a given Natural Person or Legal entity that the DID is pseudonym for is so cheap that they have not enough incentive to retain control of it. Then nothing we do with regards the holder/subject debate and claims against DIDs has any merit. The only DIDs that have any stability with regards the linkage between the controller of the DID (i.e. the entity the controls the private keys) and the claims against that DID expressed by a VC are those DIDs whose reputational value is great enough that the incentive to keep control justifies a verifier in trusting that the DID controller as subject is stable. Absent that reputational value there is no stability and a VC becomes a cheap reputation (set of claims) against a cheap pseudonym. Cheap reputation because verifiers can’t trust it, or must expend resources to reverify all the claims which obviates any value imbued in the VC by the issuer. Cheap pseudonym because DID controllers will too easily exchange or transfer their DID to some other controller.

With regard multi-sig in KERI we manage a group multi-sig identifer using a set of single-sig identifiers. The single sig identifiers may have independent existence outside their role as group members, which means they individually can have VCs issued against them and then with ACDCs chaining feature, a group VC that chains back to the individual VCs can have th effect of a multiple subjects, we call then Issuees to avoid the confusion with holder ambiguity. The graph of VCs is evaluatable as a group issued VC. Its one verifiable data structure. We don’t need to expand a VC to have multiple issuees we just have a VC whose semantics means that its edges are point back to a group of other DIDs. This way the group DID control is a multi-sig but the individual controllers are identified separately and can have reputations. Frankly chaining as a normative function of VCs simplfies many of these problems that stem from the fact IMHO that a document model is a mis-match for many of these real world problems.

IMHO relationships expressed by most VC use cases looks like property graphs not bags of triples.The key to this insight is to answer what is the most appropriate level of granularity of a node in the knowledge graph used to model complex integrated entities that are natrual persons. Is each property or claim a node in the graph or is the entiire VC a node in the graph. When we make the entire VC a node then it becomes simple to remove ambiguity about what a VC represents what its properties represent, which ones are metadata about the node and which ones are properties of the node.

Fiven that level of granularity we can state that a node may or may not have an Issuee but always has an Issuer. This makes the control relationships absolutely clear. What the Issuee represents can be VC type dependent. So no more bike-shedding about what is or is not a subject that results from using a VC as a bag of triples wherein seach triple in a bag of triples must have a subject and must have an object which therefy we take something semantically simple and explode it into something semantically difficult.

Likewise what does an edge represent. Is it an edge between two VCs or a predicate between a subject and object. When an edge just connects two VCs as nodes then we can imbue each edge with different meaning. Its meaning is the defined relationship between the two VCs it connects and we can give those edges types so we can more simply model all types of relationships. Thus each VC with ites edges can have its own type and represent what ever that type wants it to represent. So a VC is one Node with zero or more edges. Whereas a bag of triples is triple graph in its own right with a multitude of edges where each edge is simply a is-a or has-a predicate and there is no concept of edges between VCs so it becomes a triple soup which makes it very difficult to describe relationships like this VC defines a set of Issuees (I.e. a group subject) by virtue of its edges. Triples are appropriate for representing highly static highly certain abstract relationships in a static ontology like one might use in cladistics, like. A robin is-a bird, A bird-has-a beak. But if I want to describe dynamic realtionships between complex entities with claims that are revocable or uncertain or unstable and the properties that describe those complex entities are integrated interrelated or cross dependent that any attempt at , breaking those properties up into a bag of triples loses the semantic clarity of recognizing them as belonging to a single instance of a complex integrated entity. We get stuck at square one because the granularity is wrong (at least in my opinion)

OR13 commented 1 year ago

I implemented a version of this in my latest pass on vc-jwt, that simply converts a single credential to several verifiable credentials.

dmitrizagidulin commented 1 year ago

@TallTed

  • Isn’t subject vs holder already confusion enough?

There should be no confusion. I continue to be baffled by the assertion that the subject is (or is not) inherently and implicitly the holder. The subject(s) is(are) the entity/ies about which the VC contains assertions. The holder(s) is(are) the entity/ies who hold the bytes that comprise the VC, including those statements, the relevant crypto to confirm that they have not been changed since the VC was issued, and the relevant crypto to confirm that the issuer is as identified (because the issuer MUST indeed be identified) within the VC.

~~Wait, now I'm confused :) Why would the subject be inherently the holder? The holder is whoever is currently has the JSON file or whatever. Maybe the subject first had it, but then they passed it on to other institutions, etc etc. Nothing in the spec suggests or recommends that the subject and holder are similar. Where is this certainty coming from?~~

Waiiit, nope, I just misread your comment. :) I 100% agree with you. The difference seems very clear cut.

iherman commented 1 year ago

The issue was discussed in a meeting on 2022-12-07

View the transcript #### 3.1. Representing Mutli Subject Credentials in the VCDM (issue vc-data-model#931) _See github issue [vc-data-model#931](https://github.com/w3c/vc-data-model/issues/931)._ **Brent Zundel:** first issue #931, representing multi subjects in the data model. **Gabe Cohen:** may be better suited to have a discussion than an issue. **Ted Thibodeau Jr.:** no please. discussion is arguable worse. could be more confusing than just reading a thread of an issue..
decentralgabe commented 1 year ago

Would folks be open to adding text that clarifies that if multiple subjects are specified, all claims in the VC apply to both subjects?

dlongley commented 1 year ago

@decentralgabe,

Would folks be open to adding text that clarifies that if multiple subjects are specified, all claims in the VC apply to both subjects?

Yes, I would be opposed because that's not how the data model works.

The claims are subject-property-value relationships, and each property-value is therefore bound to a particular subject. For example, when VCDM claims are expressed in JSON, they look like this:

{
   "id": "someUrl",
   "propertyName1": "value1",
   "propertyName2": "value2"
}

This creates two claims:

someUrl (subject) - propertyName1 (property) - value1 (value)
someUrl (subject) - propertyName2 (property) - value2 (value)

If you have two subjects in a set together, it looks like this:

[{
   "id": "someUrl",
   "propertyName1": "value1",
   "propertyName2": "value2"
}, {
   "id": "someOtherUrl",
   "propertyName1": "otherValue1",
   "propertyName2": "otherValue2"
}]

This creates a total of four claims, but with only two about each subject:

someUrl (subject) - propertyName1 (property) - value1 (value)
someUrl (subject) - propertyName2 (property) - value2 (value)

someOtherUrl (subject) - propertyName1 (property) - otherValue1 (value)
someOtherUrl (subject) - propertyName2 (property) - otherValue2 (value)

This would be the case I believe you're referencing, where there are two "top-level" credentialSubjects in a set together as the value of credentialSubject, i.e.:

"id": "theIdOfTheVC",
"credentialSubject": [{
   "id": "someUrl",
   "propertyName1": "value1",
   "propertyName2": "value2"
}, {
   "id": "someOtherUrl",
   "propertyName1": "otherValue1",
   "propertyName2": "otherValue2"
}]

As a side note, when credentialSubject is used, it's a property of the VC itself, which means additional metadata claims are included in the graph of information:

theIdOfTheVC (subject) - credentialSubject (property) - someUrl (value)
theIdOfTheVC (subject) - credentialSubject (property) - someOtherUrl (value)

Which shows both credentialSubjects as "top-level" subjects.

Alternatively, things could be nested if there's a relationship between subjects within the VC (i.e., there's just one top-level credentialSubject that has other things it is related to). In this case there is just one "top-level" credentialSubject, but there are still other subjects in the graph of information that is described by claims in the VC. So, when nesting things, like this:

{
   "id": "someUrl",
   "propertyName1": "value1",
   "propertyName2": "value2",
   "propertyName3": {
     "id": "someNestedOtherUrl",
     "propertyName1": "otherValue1",
     "propertyName2": "otherValue2"
   }
}

This produces these claims:

someUrl (subject) - propertyName1 (property) - value1 (value)
someUrl (subject) - propertyName2 (property) - value2 (value)
someUrl (subject) - propertyName3 (property) - someOtherNestedUrl (value)

someOtherNestedUrl (subject) - propertyName1 (property) - otherValue1 (value)
someOtherNestedUrl (subject) - propertyName2 (property) - otherValue2 (value)

In short, (generally speaking) when an opening curly brace is used to start a new JSON object, that new object represents a nested subject with properties and values of its own (i.e., claims for that subject, not some other one). The nested example, with more concrete data:

{
  "id": "urn:id:1",
  "type": "Human",
  "name": "Gabe",
  "knows": {
    "id": "urn:id:2",
    "type": "Robot",
    "name": "ChatGPT"
  }
}

This would generate these claims:

urn:id:1 (subject) - type (property) - Human (value)
urn:id:1 (subject) - name (property) - Gabe (value)
urn:id:1 (subject) - knows (property) - urn:id:2 (value)

urn:id:2 (subject) - type (property) - Robot (value)
urn:id:2 (subject) - name (property) - ChatGPT (value)

We could model this without a direct relationship (no nesting) between Gabe and ChatGPT by putting them in the same set like this (where we don't claim that Gabe knows ChatGPT):

[{
  "id": "urn:id:1",
  "type": "Human",
  "name": "Gabe"
}, {
  "id": "urn:id:2",
  "type": "Robot",
  "name": "ChatGPT"
}]

This would generate these claims:

urn:id:1 (subject) - type (property) - Human (value)
urn:id:1 (subject) - name (property) - Gabe (value)

urn:id:2 (subject) - type (property) - Robot (value)
urn:id:2 (subject) - name (property) - ChatGPT (value)

In either of these cases, you can see that the claims made about urn:id:1 are NOT the same as those made about urn:id:2, i.e., Gabe is NOT a Robot, ChatGPT is NOT a Human.

decentralgabe commented 1 year ago

Thanks @dlongley that helps clear up my misunderstanding, I'll retract my previous statement. I feel that the spec text on multiple subjects in this section is insufficient in capturing the complexity of multiple subjects. Perhaps a more robust example would help. I'm also open to closing this and opening an issue in the impl guide.

decentralgabe commented 1 year ago

Closing in favor of https://github.com/w3c/vc-use-cases/issues/126