w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
287 stars 106 forks source link

W3C VC-LD Specification and Layered VC Specification: = interoperability at the application layer of layered spec #982

Closed SmithSamuelM closed 1 year ago

SmithSamuelM commented 1 year ago

I have copied my comments from issue #947 because they got lost in the general discussion.

VD-LD and VCC

This a concrete proposal that IMHO resolves the primary conflict in the community. It has two features:

Discussion

Those who think that VCs are best implemented with a layered model where only the topmost layer may or may not be an open-world model as determined by the application use case can then go build a layered model. And in those cases where the topmost layer benefits from an open-world model based on RDF then those users of VCC can still have high interoperability with VC-LD. For those applications of VCC that do not benefit from an open-world model then the primary advantage of VC-LDs is of little benefit to them and they can then benefit from avoiding the complexity and drag of an open-world model.

Drag

Much of the drag (cognitive dissonance) that we as a community experience (IMHO) is a result of the improper layering of the security concerns associated with the V in VC, and not due to the non-interoperability of an open-world set of claims. The triples in the claims can be conveyed in every case by a container layer to which the claims are opaque. The claims layer may of course, reference any meta-data in the container layer, but operationally this can be viewed as logging the container as an artifact but operationally, all the container layer needs to achieve its function is limited to the closed world set of container properties (not its payload). These container properties look like meta-data to the claims layer or may include identifiers that are subjects of claims. But because the claims layer is opaque to the container layer, there are zero dependencies by the container layer on any claim in the claims layer so contained. Triples in the claims layer may be dependent on properties in the container layer but not vice-versa. To reiterate, this allows the container layer to be a closed world model of limited scope whereas the claims layer may still include any and all triples it so desires.

Protocol Design Flaw

The drag IMHO is due to mistaking the fact that the claims layer can reference properties in the container layer as a license to collapse or flatten the container layer leaving only the claims layer. This is indeed an unfortunate protocol design flaw that is the root cause of our difficulties.

This layering now allows us to limit the interoperability discussion to only those properties necessary to achieve the functionality of the container layer. Any discussion of an open-world model is obviously out of scope and would only add drag.

Such a layered model then allows the user to decide if the payload benefits from open-world interoperability or not. But the payload (including all its claims or triples) has already been authenticated. The container layer makes a cryptographic commitment to the payload. This usually means some serialization of the payload. Any serialization will do from the perspective of the authentication layer. The authenticity is limited to this serialization. Any later expansion that may dynamically introduce semantics may weaken what can be verified against the commitment to the original serialization. But what we have achieved is total interoperability of the authentication layer because it necessarily is a closed-world model. The payload is universally simply a serialized artifact such as a hash or Merkle root or accumulator of the payload regardless of the serialization type (be it JSON, JSON-LD, CBOR, MGPK, CESR or whatever).

Layered Model

Authentication Layer and Authorization Sub-layer

I propose that for a "verifiable" credential, what is primarily being verified is authorship or authenticity, where authenticity is defined to be secure attribution to a DID as issuer. A secondary verification for some credentials is an authorization where authorization is defined to be evidence of entitlement (the dictionary definition of credential). We then have a container that provides an authentication layer whose function to to provide verifiable proof of authenticity (aka secure attribution to a DID as issuer) of its opaque contents, and we have an optional authorization sub-layer that, when employed, provides proof of entitlement to some DID (or another verifiable cryptographic construct) as the target of the entitlement.

The opaque payload could be a true pure unadulterated JSON-LD document (or not, depending on the application use case). All claims in that payload would be authenticated before they are ever opened. Any properties needed to provide proof of authorization (entitlement) would necessarily be provided in the authorization sub-layer, but forensic information about the entitlee could be part of the payload because forensic information is not required to verify the entitlement but for other conditional purposes such as recourse or enforcement.

This layered approach IMHO removes most of the friction this community is experiencing. The security of the "verifiability" employs a closed-world model limited to the authentication and authorization layers, and the open-world interoperability is unfettered by concerns about what happens in the security layer below. Proofs of authenticity look sane because they are clearly poofs of the container of opaque claims and not some ex post facto proof mechanism of an expanded payload.

Presentation Layer

This layered model can be further extended to include the proof of presentation which is different from authentication of the Issuer and authorization by the Issuer.

A presentation exchange protocol then can include the details of graduated or selective disclosure of the payload claims when appropriate. In many cases the payload of forensic information need only be conditionally disclosed especially when the entitlement properties are cleanly separated. Then proof of the presenter as target of the entitlement is the primary function of the presentation proof.

But once again, any property of the presentation layer can be referenced in an open-world claims layer as a artifact or log of the proof but those references in the claims layer are not operationally involved as a dependency of the presentation layer. Once again, we avoid the mistake of wrongly collapsing the presentation layer into the claims (payload) layer merely because a property of the presentation layer may be referenced in the claims layer. We have the discipline to recall that a reference down into a lower layer does not give us a license to collapse that lower layer (the presentation layer in this case).

David-Chadwick commented 1 year ago

But we already have what you require in the current VP-JWT spec that is independent of the embedded VC. This contains the VP issuer (holder/entitlee) and audience (issuer in your unfederated model - which (as an aside) I think is too restrictive). This is the proof metadata of my 4 layer model. So the RP can verify that the VP is intended for it, verify the signature, then go onto inspect the VC that is embedded in the VP to find out which privileges the VP holder is requesting.

SmithSamuelM commented 1 year ago

Likewise the ACDC specification provides a similar construct but takes advantage of other features not in VP-JWT. So a big tent approach for layered VCs would try to find the minimally sufficient means for each layer that maximizes interoperable security between the various stacks (JOSE, COSE, ACDC/KERI, JSON-LD, Anoncreds, etc). This can be done by having essentially independent constructions for each layer:

Authentication (and proofs thereof) Authorization (and proofs thereof) Presentation (and proofs and protocols thereof) * Payload

If we do it well, we can support mashups of the tooling stacks across the layers. We have serializations and proof algorithms. If we want broad adoption there are good reasons not to use one serialization or one proof algorithm.

Hence the thesis of this proposal. Either we as a W3C community can just call this the W3C VC-LD spec and those have good reasons not to use only JSON-LD (like me) can build a layered VCC specification somewhere else. Or this W3C community can recognize that a layered VCC spec enables a big tent approach and take that path instead.

David-Chadwick commented 1 year ago

The serialisation and semantics of the innermost credential must be standardised otherwise once all the outer crypto layers have been stripped off you will end up with a sequence of bits that you cannot understand. The mission of the VC WG must surely be to standardise the credential and its metadata, and then to allow this payload to be protected by any signature and proof metadata, preferably by already existing standardised cryptographic ones so that we do not have to re-invent the wheel.

SmithSamuelM commented 1 year ago

@David-Chadwick "The serialisation and semantics of the innermost credential must be standardised ". I would say the semantics of each layer must be standardized which is done properly can support more than one serialization. Then you do not end up with a sequence of bits you cannot understand.

ACDCs with CESR do this elegantly IMHO as a CESR stream can have interleaved versioned JSON-CBOR-MGPK-CESR seralizations. A stream parser can easily disambiguate and cold start. There is no technical reason we can't support multiple serializations. The problem is simpler than supporting multiple proof types as there are way more practical proof types that require way more resources (libraries) to support than practical serializations. And the innovation in proof types will explode once post-quantum crypto becomes standarized.

Indeed the technical challenge will be managing the proliferation of crypto signature types and proof types. If we have not learned anything from DID method proliferation it is that proof types will proliferate and frankly its a pipe dream to believe that there will be broad interoperability based on merely a standard serialization or even standard ways to convert the semantics of payloads between serializations.

So if you really want rigid interop then just build VC-LD and only allow LD-Proofs and rigid JSON-LD serializations. Then you will have an interoperable island of JSON-LD RDF tooling. But not interoperable security.

But if you want interoperable security, then serialization is a non-issue because its easy to identify the serialization. Interoperability is only useful when you provide the ability for any verifier to appraise the key state and the proof type and decide if its secure enough to accept. If not then you won't have acceptance, not because a verifier can't recognize or can't process the proof given the key state but because the proof is too weak to meet that verifier's standard of security so they will not accept it. The "interoperability" in interoperable security means a common understanding of the security posture of the other party in any interaction, including parties to interactions that are upstream and downstream of the given interaction (hence chaining).

If we don't build a standard that gives the verifier (in a totally decentralized way) the ability to appraise the key state and proof in order to decide if it meets their (the verifier's) security (authenticity, confidentiality, privacy) requirements, then we are building a house of cards. We would repeat the worst outcome of federated identity which forces verifiers (relying) parties to trust the issuer/presenter/federator regardless of the verifier's security policy.

And not just the common use of verifier, all parties are in some sense verifiers. The Issuee needs to be able to appraise the key state and proof of the Issuer. Every participant needs to be on the same level playing field. We call this end-verifiability. Forcing any participant to use a compromised (from the participants perspective) tool stack is just repeating the mistakes of the past.

This will no doubt result in non-interoperability of payload data flow. Not because any party to an interaction can't understand what the other party is saying, but because the security by which the other party is saying it is inadequate to accept the payload data. This is why the payload MUST be opaque, so that the security layers can operate independent of the payload serialization. A hash is sufficient to represent the payload from a security layer perspective as long as the hash type is known.

This is the essence of what I mean by saying we must have interoperable security first. The ability to understand the security of the other party in order to appraise it and then accept or deny makes the security interoperable. Not forced acceptance of what the other party is communicating simply because they use the same syntax and serialization no matter how weak or strong the security.

mwherman2000 commented 1 year ago

I've been working on the following taxonomy/roadmap metamodel for a DIDComm-based ecosystem (Web 7.0). I'm offering it up as an example of something similar that can be created for the VCDM.

UPDATE 2: At 1:16:12 into the tutorial https://www.youtube.com/watch?v=ZFb4OXov7wg&list=PLU-rWqHm5p44AsU5GLsc1bHC7P8zolAAf&index=3&t=4572s, I describe how to use a roadmap metamodel like the one depicted below for Web 7.0. The idea is to create a similar roadmap metamodel or compass for Verifiable Credentials (with different dimensions/axis). @SmithSamuelM what do you think? ...part of the message is to (collaboratively) manage VCs more like a product.

image

TallTed commented 1 year ago

@mwherman2000 —

I'm nearly completely baffled by your graphic. There's almost zero flow from one concept to another; instead everything seems to be delivered (or fed?) by DIDComm-ARM. Somehow this new "Web 7.0" springs to life in its entirety, with no obvious connection to any of its predecessors, including but not limited to the mostly undefined "Web 5.0" which itself is just a mongrelization of "Web 2.0 + Web 3.0", where "Web 3.0" is even less defined than "Web 2.0".... And what changes between "Web 7.0" and "Web 7.1"?

Even if all this were clear, there's the question of how this image fits into the conversation in this thread, to which, as far as I can tell, the answer is, "it doesn't."

mwherman2000 commented 1 year ago

@TallTed I'm sorry you weren't able to grasp the idea of taxonomy/roadmap metamodel ...and this being one example. I don't have more to add unless you have a more specific question.

mwherman2000 commented 1 year ago

@TallTed If one of your questions is: What is Web 7.0?
Checkout https://hyperonomy.com/2022/12/18/web-7-0-didcomm-agent-architecture-reference-model-didcomm-arm-0-40-december-18-2022/. It's a topic of discussion in the DIF DIDComm WG UG - and not, as yet, in W3C/CCG.

TallTed commented 1 year ago

@mwherman2000 —

I'm sorry you weren't able to grasp the idea of taxonomy/roadmap metamodel

Nice slap in the face, there. More than a little bit condescending. I refer you yet again to the Positive Work Environment at W3C: Code of Ethics and Professional Conduct which covers all interactions within all W3C groups (including but not limited to the CCG and the VCWG).

I have worked with many taxonomies, roadmaps, metamodels, and blends thereof. Your diagram does not conform to my understanding of any of these, including such blends.

Checkout [a page which leads to a PDF of 80+ pages of dense text and images]

If it takes 80+ pages of supporting text to make your diagram comprehensible, something's gone wrong somewhere. Part of what I find there is that your "Web 7.0" ignores and skips past what others are discussing as "Web 5.0" (which is, according to some, the sum of "Web 2.0" and "Web 3.0", neither of which has been rigorously defined anywhere, so far as I'm aware, though the meaning of the former is reasonably consistently understood; according to others, it's meant to label the "Emotive Web"), as well as the inexplicably yet-to-be-discussed "Web 4.0" or "Web 6.0".

[Web 7.0 is] a topic of discussion in the DIF DIDComm WG UG - and not, as yet, in W3C/CCG.

The W3C-VCWG operates this GitHub repo, with a fair bit of interaction with the W3C-CCG. Topics discussed by the "DIF DIDComm WG UG" (I was able to locate the DID Communication Working Group but no UG therein?) should not be assumed to be familiar, understood, monitored, etc., to/by W3C-connected participants here. Links to such discussions are always appropriate and helpful if not always necessary.

mwherman2000 commented 1 year ago

I want to re-energize this discussion in favor of @SmithSamuelM 's proposal for a multi-layered architecture reference model for Verifiable Credentials. The primary reason is: competition.

As I've been talking to larger and larger audiences about Web 7.0 and the DIDComm Architecture Reference Model (DIDComm-ARM), a number of people have questioned Web 7.0's focus on VCs ...and there is no reason for focussing only on VCs. A DIDComm Message Attachment can embed, by value or by reference, any type of information - in any data format.

Here's a small sample (which includes mDLs, X.509 certificates, audio recordings, temperature sensor readings, Office documents, concert tickets, etc. etc.): https://hyperonomy.com/2023/01/17/didcomm-message-attachment-types/

In the real world, VCs face real competition and I believe one effective way to combat this competition is a multi-layered Verifiable Credential architecture reference model (similar to the Web 7.0 DIDComm-ARM diagrams I've shared above). I don't have enough spare time to help out with the VC-ARM right now. I hope someone picks it up and moves the goalposts for VCs.

brentzundel commented 1 year ago

@TallTed I'm sorry you weren't able to grasp the idea of taxonomy/roadmap metamodel ...and this being one example. I don't have more to add unless you have a more specific question.

@mwherman2000 comments such as this are absolutely uncalled for.

mwherman2000 commented 1 year ago

I don't have enough spare time to help out with the VC-ARM right now. I hope someone picks it up and moves the goalposts for VCs.

I actually ended up backing into some work on a layered "credential" model for the Web 7.0 Trust Spanning Layer Framework. Check out slides 13-31 here: https://www.youtube.com/watch?v=rrFfuUHHmr4&list=PLU-rWqHm5p44AsU5GLsc1bHC7P8zolAAf&index=5&t=607s . It's a lot of slides but it's best/a requirement to watch them all.

This could serve as a starting point for adding an optional layer to support JSON-LD (and RDF) ...and their eventual removal from the core VCDM specification.

Cc: @SmithSamuelM

mwherman2000 commented 1 year ago

Here’s a pre-proposal for what I’m calling the VC Architecture Reference Model (VC-ARM) aka VC Blood Types. Any feedback? Your thoughts? Are there additional dimensions?

image

I’ll produce a video tutorial sometime over the weekend.

mwherman2000 commented 1 year ago

UPDATE: I've renamed the tutorial to:

Where should the VC-ARM poly-specification be incubated, birthed, and nurtured?

What dimensions need to be added and how can they be depicted? For example, ...

Cc: @SmithSamuelM @dhh1128

p.s. It's also time to re-read the whole thread. :-)

Sakurann commented 1 year ago

RESOLVED: The base media type for the VCDM is credential+ld+json. @context is required (MUST) in the base media type; other media types MAY choose to include @context. Serializations in other media types (defined by the VCWG) MUST be able to be transformed into the base media type. Another media type MUST identify if this transformation is one-directional or bi-directional. Bi-directional transformation MUST preserve @context. Transformation rules MUST be defined, but not necessarily by this WG.

TallTed commented 1 year ago

@Sakurann — Please revisit https://github.com/w3c/vc-data-model/issues/982#issuecomment-1433775569 and wrap each instance of @context in code fences, a la `@context`.