w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
287 stars 105 forks source link

How is an id property of a credentialSubject supposed to be differentiated (by executing code or a person processing a VC)? #793

Closed mwherman2000 closed 3 years ago

mwherman2000 commented 3 years ago

Closely related to https://github.com/w3c/vc-data-model/issues/792 but different...

"Part b" of issue https://github.com/w3c/vc-data-model/issues/792 is...

How is an id property of a credentialSubject supposed to be differentiated (by executing code or a person processing a VC)? When is an id claim supposed to be interpreted as either: a) A plain old claim named id (like the other claims that might appear under the credentialSubject) vs. b) The identifier of a corresponding credential subject (https://github.com/w3c/vc-data-model/issues/792)

When is the id property to be interpreted as a) or b)? ...can it both? i.e. a) and b)

Or do we need a completely different approach for marking the identifier of a corresponding credential subject ...for example, does credentialSubject id need a completely different, more unique name? ...for example, credentialSubject associatedSubjectId

Keep in mind that a lot of originating systems are simply going to copy in a collection of claims from a system of record into a VC ...and it is very likely that the originating system already has a property named id ...for example, a invoicing, purchase order, health pass, SharePoint list item, or drivers license system. This isn't something we shouldn't be burdening developers and system maintainers with - literally for the rest of their lives. Also keep in mind, a lot of this processing of system of record data into VCs will be automated (in the fullness of time) by plain old ETL platforms (https://en.wikipedia.org/wiki/Extract,_transform,_load).

kdenhartog commented 3 years ago

...and it is very likely that the originating system already has a property named id

JSON-LD makes it possible for this to not cause conflict. In the JSON document we could just as easily rename the property to foo. id has been chosen specifically to help stay as closely to @id which is a reserved property as possible.

mwherman2000 commented 3 years ago

...and it is very likely that the originating system already has a property named id ... In the JSON document we could just as easily rename the property to foo. id has been chosen specifically to help stay as closely to @id which is a reserved property as possible.

Which id property @kdenhartog? Can you be more specific? In the scope of what we're talking about, there 3 instances of id?

  1. The credential id
  2. The credentialSubject 'id (if present)
  3. The source system of record that also uses id

@kdenhartog Which instance of id are you proposing to rename?

If we use this Example 2c, what does the VC look like after the renaming you propose?

{
    "@context": [
      "https://www.w3.org/2018/credentials/v1",
      "https://www.w3.org/2018/credentials/v1/credentialSubject_id_subject",
      "https://www.w3.org/2018/credentials/examples/v1"
    ],
    "id": "did:colors:primarycolorpalette",
    "type": ["VerifiableCredential", "ColorPalette"],
    "credentialSubject": {
      "comment": "The first id is, say, the Subject, and the second id is, say, from the system of record",
      "id": "did:colorpalettes:primarycolors",
      "id": "fancycolors1234",
      "colors": [
          "red",
          "green",
          "blue"
      ]
    },
    "proof": "..."
  }
}
dlongley commented 3 years ago

@mwherman2000,

The property credentialSubject.id is always used to express the identifier for the credential subject. But you can define all your own properties from there -- including changing the meaning of id as long as it is nested underneath one of your own properties. You need to define all of the properties you want to use in the credential subject via a separate JSON-LD context. This ensures that there are strong semantics all the way from the root of the credential to the properties you create. You can define properties so that they can hold arbitrary JSON if you really to as well. So, if you want to just dump some kind of JSON data record into a VC, you can define a property such as data with a term definition that includes "@type": "@json" in your context (per JSON-LD @context rules) and then model your VC like this:

{
    "@context": [
      "https://www.w3.org/2018/credentials/v1",
      "https://my.example/data-record-context"
    ],
    "id": "urn:uuid:5ab96bc8-0128-11ec-9899-10bf48838a41",
    "type": ["VerifiableCredential", "DataRecordCredential"],
    "credentialSubject": {
      "type": "DataRecord",
      "data": {
        "id": "Here 'id' can mean whatever you defined in your context, go to town.",
        "do": "whatever",
        "you": "want"
      }
    },
    "proof": "..."
  }
}
mwherman2000 commented 3 years ago

@dlongley I'm not sure this is what you intended but I really like it ;-) :-) I can definitely work with the following for both Unbound Credentials as well as Bound Credentials...

{
    "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://www.w3.org/2018/credentials/examples/v1"
    ],
    "id": "did:colors:primarycolorpallette",
    "type": [
        "VerifiableCredential",
        "ColorPalette"
    ],
    "credentialSubject": {
        "comment": "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)",
        "id": "did:colorpalettes:primarycolors",
        "claims": [
                "id": "some system of record id",
                "colors": [
                    "red",
                    "green",
                    "blue"
                ]
        ]
    },
    "proof": "..."
}

Any dissenters? .. any objections?

mwherman2000 commented 3 years ago

@kdenhartog @dlongley @David-Chadwick ,

Separate question: is the above pattern acceptable enough to be documented in either the VC data model specification and/or VC use case specification?

The important (critical) contribution is that this pattern aligns and unifies the VC physical data model with the VC conceptual model.

kdenhartog commented 3 years ago

@dlongley I'm not sure this is what you intended but I really like it ;-) :-) I can definitely work with the following for both Unbound Credentials as well as Bound Credentials...

{
    "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://www.w3.org/2018/credentials/examples/v1"
    ],
    "id": "did:colors:primarycolorpallette",
    "type": [
        "VerifiableCredential",
        "ColorPalette"
    ],
    "credentialSubject": {
        "comment": "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)",
        "id": "did:colorpalettes:primarycolors",
        "claims": [
                "id": "some system of record id",
                "colors": [
                    "red",
                    "green",
                    "blue"
                ]
        ]
    },
    "proof": "..."
}

Any dissenters?

Yeah, that's not a valid JSON-LD (or JSON) document. I'm not sure what's the intention here but here's what I'm extrapolating.

Starting with this JSON document (modified from yours to make it valid):

{
    "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://www.w3.org/2018/credentials/examples/v1",
      {
        "@context": {
            "claims": {
                "@id": "https://example.com/claims",
                "@type": "@json"
            }
        }
    }],
    "id": "did:colors:primarycolorpallette",
    "type": [
        "VerifiableCredential",
        "ColorPalette"
    ],
    "credentialSubject": {
        "comment": "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)",
        "id": "did:colorpalettes:primarycolors",
        "claims": { 
            "id": "some system of record id",
            "colors": [
                "red",
                "green",
                "blue"
            ]
        }
    },
    "proof": {}
}

This produces the following N-Quads (the actual data that gets signed)

<did:colorpalettes:primarycolors> <https://example.com/claims> "{\"colors\":[\"red\",\"green\",\"blue\"],\"id\":\"some system of record id\"}"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON> .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://json-ld.org/playground/ColorPalette> .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://www.w3.org/2018/credentials#VerifiableCredential> .
<did:colors:primarycolorpallette> <https://w3id.org/security#proof> _:b0 .
<did:colors:primarycolorpallette> <https://www.w3.org/2018/credentials#credentialSubject> <did:colorpalettes:primarycolors> .

The problem is that the id doesn't just get ignored when converting between JSON-LD to RDF. Notice how the first line in the N-Quads matches the credentialSubject.id. That's because the id is acting as the subject in the RDF statement.

If we remove that id and make this an unbounded credential like the following JSON does,

{
    "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://www.w3.org/2018/credentials/examples/v1",
      {
        "@context": {
            "claims": {
                "@id": "https://example.com/claims",
                "@type": "@json"
            }
        }
    }],
    "id": "did:colors:primarycolorpallette",
    "type": [
        "VerifiableCredential",
        "ColorPalette"
    ],
    "credentialSubject": {
        "comment": "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)",
        "claims": { 
            "id": "some system of record id",
            "colors": [
                "red",
                "green",
                "blue"
            ]
        }
    },
    "proof": {}
}

That produces the following N-Quads:

<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://json-ld.org/playground/ColorPalette> .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://www.w3.org/2018/credentials#VerifiableCredential> .
<did:colors:primarycolorpallette> <https://w3id.org/security#proof> _:b0 .
<did:colors:primarycolorpallette> <https://www.w3.org/2018/credentials#credentialSubject> _:b2 .
_:b2 <https://example.com/claims> "{\"colors\":[\"red\",\"green\",\"blue\"],\"id\":\"some system of record id\"}"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON> .

notice how the RDF statement now doesn't include the <did:colorpalettes:primarycolors> part and has instead replace credentialSubject.id with the _:b2

That's because the subject in the RDF statement has been replaced by a "blank node". In other words, in the semantic data world the subject doesn't carry an identifier, but it is still identified. This is an important delineation here that doesn't align with the unbounded model. This is the tricky part of the difference between JSON and JSON-LD and while your physical data model does work with JSON, it doesn't work with JSON-LD and therefore can't be applied to the abstract data model. In which case, I'm kind of stuck seeing how this unbounded model could be applied to the theoretical data model because it doesn't align on the concept of subjects. All statements made by the issuer are still about an entity. That entity is the subject.

So what did I mean by this?

JSON-LD makes it possible for this to not cause conflict. In the JSON document we could just as easily rename the property to foo. id has been chosen specifically to help stay as closely to @id which is a reserved property as possible.

Because of the way that we define the id property in the VC context (like so) Screen Shot 2021-08-20 at 5 58 54 PM (2)

we intentionally made the @id property map to the id property at the credentialSubject.id. Notice we did the same at the top level id property as well. We could have made that part of the json go from "id": "@id", => "foo": "@id",

Here's an example that shows what that does in the JSON and the following N-Quads:

{
    "@context": [
      "https://schema.org",
      {
        "@context": {
                "foo": "@id",
            "claims": {
                "@id": "https://example.com/claims",
                "@type": "@json"
            }
        }
    }],
    "id": "did:colors:primarycolorpallette",
    "type": [
        "VerifiableCredential",
        "ColorPalette"
    ],
    "credentialSubject": {
        "comment": "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)",
      "foo": "did:example:123",
      "claims": { 
            "id": "some system of record id",
            "colors": [
                "red",
                "green",
                "blue"
            ]
        }
    },
    "proof": {}
}

and the N-Quads:

<did:colors:primarycolorpallette> <http://schema.org/credentialSubject> <did:example:123> .
<did:colors:primarycolorpallette> <http://schema.org/proof> _:b0 .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ColorPalette> .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/VerifiableCredential> .
<did:example:123> <http://schema.org/comment> "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)" .
<did:example:123> <https://example.com/claims> "{\"colors\":[\"red\",\"green\",\"blue\"],\"id\":\"some system of record id\"}"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON> .

The reason the subject in that JSON is <did:example:123> is because the JSON in the context that matches this: "foo": "@id"

If we were change only the credentialSubject.foo to credentialSubject.bar (without changing anything in the @context) we get the following JSON and N-Quads:

{
    "@context": [
      "https://schema.org",
      {
        "@context": {
                "foo": "@id",
            "claims": {
                "@id": "https://example.com/claims",
                "@type": "@json"
            }
        }
    }],
    "id": "did:colors:primarycolorpallette",
    "type": [
        "VerifiableCredential",
        "ColorPalette"
    ],
    "credentialSubject": {
        "comment": "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)",
      "bar": "did:example:123",
      "claims": { 
            "id": "some system of record id",
            "colors": [
                "red",
                "green",
                "blue"
            ]
        }
    },
    "proof": {}
}
<did:colors:primarycolorpallette> <http://schema.org/credentialSubject> _:b0 .
<did:colors:primarycolorpallette> <http://schema.org/proof> _:b1 .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ColorPalette> .
<did:colors:primarycolorpallette> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/VerifiableCredential> .
_:b0 <http://schema.org/bar> "did:example:123" .
_:b0 <http://schema.org/comment> "The following id associated with a Subject is optional according to the specification (Unbound Credential use case)" .
_:b0 <https://example.com/claims> "{\"colors\":[\"red\",\"green\",\"blue\"],\"id\":\"some system of record id\"}"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON> .

Now all the subjects have become blank nodes again and the bar claim gets treated like any other claim. This is happening because the @id property is a special term that's reserved in JSON-LD and when we add "id": "@id" to a context we're mapping the special behavior of the @id term to the property we define. It remains though that the @id will always be used to map to the subject of the object in the JSON object. Even if we map it to a different term like id or foo

So, to round things out. We can't just remove the subject like you're proposing in the unbound credentials because it breaks the way in which we represent things in the JSON-LD representation of the abstract data model. Sure, we could do this with JSON but then the JSON VC couldn't be converted to a JSON-LD representation and furthermore it would be extremely difficult to delineate between the two representations. I can't think of a way to code it to be able to differentiate the two representations in code.

We can only remove the identifier of the subject because JSON-LD always requires a subject in order to convert to the N-Quads properly. When we don't include an id property we're just making up an identifier for it. So my question back to you would be can you produce a valid JSON-LD example that aligns your thinking around unbounded credentials with the JSON-LD representation of the abstract data model of the VC spec?

mwherman2000 commented 3 years ago

@kdenhartog Your analysis is extraordinary (it will take me some time (the weekend) to digest it all). Thank you for the work and the explanation.

My first reaction is: this is an extraordinary amount of accommodation to support JSON-LD when all that developers need is a developer friendly VC (conceptual and logical) core data model.

My second reaction is:

a) [PRINCIPLED OBJECTION] Nothing like this is, from my reading, is alluded to in the current VC specification - nothing. If it is, please send links and quotations from the VC data model spec (i.e. the VC data model's reliance on JSON-LD and JSON-LD to RDF mappings). No connections are made in the current VC specification. There are no worked examples like the ones you've presented above.

b) In the Trusted Digital Web platform, we don't use or need JSON-LD (or JSON for that matter) to support Structured Credentials. We just need a logical data model (that sits between the conceptual data model and the physical data model) that, if needed, serializes to a useful form of JSON. See https://www.youtube.com/watch?v=KXhyvKXCLus&list=PLU-rWqHm5p45dzXF2LJZjuNVJrOUR6DaD&index=1 for the details - if you haven't seen it already.

Where do we/I go from here?

Because the above (as well as some other things that are happening in the background), I'm tempted to tap out of this VC WG.

What I think (and expect) should be is a VC data model specification that projects a strong, platform-independent, technology agnostic, conceptual and logical VC data model has become this (largely undocumented) complex JSON/JSON-LD/RDF beast - which IMO will have low/slow/no adoption in the app developer communities.

Your thoughts?

kdenhartog commented 3 years ago

To understand how the spec looks the way it looks today, you have to go back through the history to understand how we got here. Historically speaking, this work was built on JSON-LD purely and it's with good intent. In an open world model where the Issuer and Verifier are not supposed to be able to communicate other than by way of the Verifiable Credential itself, how is the verifier supposed to know what the issuer meant for the given_name claim? Does the issuers given_name claim match the verifiers family_name or their name claim when processing it? This is the power of the semantic web where the RDF statements can convey semantics.

However, this view was obviously not shared by all within the working group and so there was an agreement made to move to an abstract data model with multiple representations. It was purely out of finding a satisfying middle ground so something could be put in the spec. This abstract data model made it so developers who were working with different representations could all build off of the spec, but there were parts of the architecture which were left unchanged from the semantic web architecture that pre-dated the agreement. This was because many of the ideas that were left were considered still valuable. One of which was the idea that the "issuer makes claims about a subject of the credential".

It's due to the way the spec was made that we have what we have. It wasn't ideal to anyone, but everyone could grit their teeth and live with what they were able to agree on for the most part. This left us with where we are today and the reason that so much of this remains, in thinking, built around the semantic open data world view.

After all, the only better way we in the technology world have found how to get the issuer and verifier to understand the meaning of the statements is to spec them out, either in public working groups which leads to registries or by way of verifiers depending on documentation published by every issuer.

In other words, this is a new approach that's intended to be scaled in the same way that Google scales their cards in their search engine without requiring high levels of coordination between all of the parties.

kdenhartog commented 3 years ago

a) [PRINCIPLED OBJECTION] Nothing like this is, from my reading, is alluded to in the current VC specification - nothing. If it is, please send links and quotations from the VC data model spec (i.e. the VC data model's reliance on JSON-LD and JSON-LD to RDF mappings). No connections are made in the current VC specification. There are no worked examples like the ones you've presented above.

The reality is that no spec is going to be perfect for everyone. After all it's written in a "design by committee" format with many competing views. Furthermore, the theoretical purity is last priority when it comes to design principles. Practically speaking, if we produce robust, interoperable software which meets an end users goal the spec has succeeded. No matter how many holes and imperfections can be found.

mwherman2000 commented 3 years ago

To understand how the spec looks the way it looks today, you have to go back through the history to understand how we got here.

Thank you for the background explanation ...and the detailed history.

But I beg to differ @kdenhartog, I need and should only need to read the current text of the VC data model specification to understand it and to correctly implement software based on it. The actual text of what the VC WG is being asked to "ratify" is the story the way in appears in the text of the specification. This is the story we're being asked to "approve" ...not the history that led to the current text.

[PRINCIPLED OBJECTION] The VC data model specification 1.0 is not implementable in the ways it appears to be intended based on the text of the current specification document (based on the above discussion). [PROPOSED RESOLUTION] In the VC data model specification, explicitly detail the bindings between the VC conceptual data model (https://www.w3.org/TR/vc-data-model/#core-data-model) and at least 3 target data platforms/formats:

CC: @brentzundel @wyc

David-Chadwick commented 3 years ago

I am one of those who believe that VCs will get much broader acceptance worldwide if developers and implementors do not need to know anything about JSON-LD or use any JSON-LD tools. The only compromise we have had to make to pure JSON is ensure this is to mandate the insertion of at-context definitions in VCs in order to allow short form property names and values to be mapped to globally unique URIs. And that's it (AFAIK). If you use JWTs for turning credentials into verifiable credentials you can continue to use JSON only tools, and all this talk from @kdenhartog about JSON-LD, RDF mappings and JSON-LD proofs is irrelevant because none of them apply to the use of pure JSON with VCs (as long as they have valid at-context definitions in them).

mwherman2000 commented 3 years ago

I'm going to try and close this issue. This issue has been answered here:

An actual solution for the business document use case is proposed/described here:

The principle objection noted here https://github.com/w3c/vc-data-model/issues/793#issuecomment-902715529 has been surfaced/republished at a new top-level issue here: https://github.com/w3c/vc-data-model/issues/797

Thank you