w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
319 stars 55 forks source link

Verifiable Credentials Data Model v2.0 #860

Closed awoie closed 4 months ago

awoie commented 1 year ago

こんにちは TAG-さん!

As an editor, I'm requesting a TAG review of Verifiable Credentials Data Model v2.0.

The VCDM 2.0 specification provides a JSON-LD data model that enables the issuance, sharing, and verification of Verifiable Credentials and Verifiable Presentations in a secure and interoperable manner. These credentials provide a way for individuals, organizations, and other entities to digitally represent and share their qualifications, attributes, and/or other relevant information. Verifiable Credentials are designed to enhance trust, privacy, and control in digital interactions by allowing the owner of the credentials to control how their information is shared and verified.

Further details:

You should also know that...

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

msporny commented 7 months ago

@rhiaro @hadleybeeman it is unclear if the W3C TAG did a review on the Verifiable Credentials Data Model v2.0 specification when you did a review on the VC Data Integrity specifications.

We are getting ready to enter CR for the Verifiable Credentials Data Model specification at the end of December, and we're having to mark the TAG review as (timed out) because this issue makes it seem like a review was not done. However, we also note that two TAG members have appeared at a few of our meetings over the years, have reviewed VCDM v1.0 and v1.1, and have most recently appeared at our F2F meeting at W3C TPAC in Seville just a few months ago (so, maybe that was part of the review)?

What is the current status of this review? We have marked it as (request timeout) in our draft CR transition request. Please let us know if this is not the case.

rhiaro commented 7 months ago

Hi @msporny , really sorry about the delay in closing off this and the related reviews. It's not clear that we have TAG consensus on a response to the VC specifications yet, but we're going to do our best to resolve this and get back to you next week.

rhiaro commented 6 months ago

Sorry for the delayed response. We @torgo, @hadleybeeman, @hober, @plinss and I) discussed this in one of our calls last week (I've done my best to summarise our feedback here, but please chime in if I got something wrong).

We noted the change between the 1.1 and 2.0 versions of the data model which restricts the data model expression to compact JSON-LD, with plain JSON compatibility being retained via the (non-normative) credential type-specific processing mechanism. In general we feel like this is a step in the right direction in terms of mitigating problems with polyglot formats, but we had some concerns about compatibility with generic JSON-LD tooling.

Specifically, we wanted to know if you can reliably use generic JSON-LD tooling and have the output remain compatible with systems that can only process VCs without full JSON-LD processing (credential type-specific processing). What behaviour have you seen in the wild with generic JSON-LD tooling and VCs?

There is language in the specification which looks like it might be to help with this, but it isn't normative. Could you say why you say "document authors are urged to", rather than making this a strict conformance requirement?

As the 1.1 version of the data model was serialisable as JSON and JSON-LD, we wondered if there were lessons learned in transforming between the two formats that have carried forward to inform the changes for v2.0?

As discussed in our closing comment for the TAG review of v1.0, we remain concerned about how much of this ecosystem is outside of the working group's remit. We are limiting the scope of our comments to the data model itself, and they should not be applicable to verifiable credentials more broadly.

awoie commented 6 months ago

Sorry for the delayed response. We @torgo, @hadleybeeman, @hober, @plinss and I) discussed this in one of our calls last week (I've done my best to summarise our feedback here, but please chime in if I got something wrong).

We noted the change between the 1.1 and 2.0 versions of the data model which restricts the data model expression to compact JSON-LD, with plain JSON compatibility being retained via the (non-normative) credential type-specific processing mechanism. In general we feel like this is a step in the right direction in terms of mitigating problems with polyglot formats, but we had some concerns about compatibility with generic JSON-LD tooling.

Specifically, we wanted to know if you can reliably use generic JSON-LD tooling and have the output remain compatible with systems that can only process VCs without full JSON-LD processing (credential type-specific processing). What behaviour have you seen in the wild with generic JSON-LD tooling and VCs?

There is language in the specification which looks like it might be to help with this, but it isn't normative. Could you say why you say "document authors are urged to", rather than making this a strict conformance requirement?

As the 1.1 version of the data model was serialisable as JSON and JSON-LD, we wondered if there were lessons learned in transforming between the two formats that have carried forward to inform the changes for v2.0?

As discussed in our closing comment for the TAG review of v1.0, we remain concerned about how much of this ecosystem is outside of the working group's remit. We are limiting the scope of our comments to the data model itself, and they should not be applicable to verifiable credentials more broadly.

@msporny We got an update from the TAG design reviews.

msporny commented 6 months ago

Sorry for the delayed response. We @torgo, @hadleybeeman, @hober, @plinss and I) discussed this in one of our calls last week (I've done my best to summarise our feedback here, but please chime in if I got something wrong).

Thank you for the review, TAG! I am responding in my capacity as an Editor of the VCDM v2.0.

The VCWG is aware that the TAG has performed this review (it came up in our call yesterday) and will respond as a group if it deems my response insufficient (for any reason).

Polyglot Formats

We noted the change between the 1.1 and 2.0 versions of the data model which restricts the data model expression to compact JSON-LD, with plain JSON compatibility being retained via the (non-normative) credential type-specific processing mechanism. In general we feel like this is a step in the right direction in terms of mitigating problems with polyglot formats, but we had some concerns about compatibility with generic JSON-LD tooling.

Does the TAG have an official position on polyglot formats, the problems they cause, or how those problems might be mitigated? Without such a position, it might be difficult to respond with specifics. That said, I will do my best below:

While it can be true that polyglot formats, where different processors "interpret" the format in non-interoperable ways, can cause harm in certain scenarios, it is also true that there are instances of processors that benefit from not needing to process a data format fully (or to be able to break it down into stages). For example, HTML is a polyglot format (different types of processors process portions of an HTML web page in different ways):

The above is provided as food for thought: Not all polyglot formats result in negative outcomes, and there are a fair number of positive outcomes that are the result of polyglot formats, such as HTML.

Use of Generic JSON-LD Tooling and Credential Type-Specific Processing

Specifically, we wanted to know if you can reliably use generic JSON-LD tooling and have the output remain compatible with systems that can only process VCs without full JSON-LD processing (credential type-specific processing).

The answer is "Yes, as long as you follow the guidance in the specification", you can reliably use generic JSON-LD tooling and have the output remain compatible with systems that do credential type-specific processing.

For a specific credential type-specific scenario, it is possible to use generic JSON-LD tooling to produce output that can be used by a credential type-specific processor. The working group, and VC ecosystem in general, has endeavored to ensure that remains true during the v2.0 work.

If one were to take generic JSON-LD tooling, express a credential type-specific object (using a mixture of JSON-LD compacted and non-compacted form), and run general JSON-LD compaction on the object, the result should be an object that a credential type-specific processor can process. The purpose of the Credential Type-Specific Processing section is to convey what guarantees this characteristic.

That is, however, only part of why the Credential Type-Specific Processing exists. There has been this misunderstanding since Verifiable Credentials v1.0 that "JSON-LD processing" is ALWAYS mandatory, which is not the case. Remember that the JSON-LD specification is written such that a developer need only use as much of it as is helpful to their use case (and, ideally, no more). To strain an analogy, just because a browser doesn't support the geolocation API in an HTML website doesn't mean that the browser is a non-compliant HTML processor; "full HTML processing" does not require that all Web Platform APIs are supported. Similarly, "full JSON-LD processing" does not require that all JSON-LD APIs MUST be supported.

JSON-LD has two aspects to it: the syntax and the API. The syntax tells one how you express a data model. The API tells one how to transform that data model into various different forms of the same data model. For understandable reasons, some implementers thought they had to implement BOTH the JSON-LD syntax /and/ the JSON-LD API to be conformant to the Verifiable Credentials specifications, when in reality, they just needed to conform to the JSON-LD syntax.

To provide a concrete example, implementers were concerned that if they used JSON Schema to check an incoming Verifiable Credential, that they were non-compliant with the specification because they never used the JSON-LD API to ensure well-formed-ness. The answer was, and still is: You do not need to use the JSON-LD API, typically via the .expand(), .compact(), or .toRDF() API calls, to check that compliance. A JSON Schema is good enough and will give you a definitive answer (for a credential-type specific application).

Behaviour "In The Wild" and Interoperability Testing

What behaviour have you seen in the wild with generic JSON-LD tooling and VCs?

There is broad use of general JSON-LD tooling being used in the Verifiable Credential ecosystem. Namely, there have been multiple interoperability fests[1][2][3] containing upwards of 20-30 implementers demonstrating a combination of use of JSON-LD processors and of credential type-specific processors.

The interoperability challenges over the past few years have largely been around protocols and not the data format itself.

Another behaviour that we have seen is the use of "enveloping" security mechanisms such as JOSE and COSE to "envelope" the Verifiable Credential payload. When coupled with a JSON Schema, or hard-coded set of credential type-specific checks, JSON-LD API processing is not strictly necessary.

Other behaviors include the use of JSON Schema, or a hard-coded set of credential type-specific checks, in an HTTP API processing pipeline, where one checks the incoming Verifiable Credential for well-formed-ness using these checks (that is, NOT using a JSON-LD API) before further processing is performed. Downstream processing may or may not perform JSON-LD API processing on the input (and that is ok as long as the guidance in the specification is followed).

It is due to these usage patterns that the Verifiable Credential Working Group felt that it needed to expend the effort to make it clear that there is ONE formal syntax and data model (JSON-LD Compacted form), which provides a concrete set of rules and an API to determine strict conformance to the syntax and data model. A purely JSON-based approach would have required the group to effectively re-invent a variety of features that JSON-LD has today.

However, performing strict, generalized conformance checking is not required in a number of real-world situations and even when checking conformance, it is okay to not use the JSON-LD API (expand/compact) in order to demonstrate compliance with the specification. In other words, it is always possible to determine strict compliance to the specification for any Verifiable Credential because there is one formal syntax and data model, however, it is not always necessary to do full, formal conformance checking at every stage in a processing pipeline.

There is language in the specification which looks like it might be to help with this, but it isn't normative. Could you say why you say "document authors are urged to", rather than making this a strict conformance requirement?

Yes, there was at least one Working Group member that said that they would Formally Object if we made it such that they couldn't use some of the features stated in the section of the specification that you are referencing. We expect others would have piled onto the Formal Objection if we proceeded down a path that made those normative requirements.

Lessons Learned from v1.0 and v1.1

As the 1.1 version of the data model was serialisable as JSON and JSON-LD, we wondered if there were lessons learned in transforming between the two formats that have carried forward to inform the changes for v2.0?

There was really never any "transforming between the two formats" (if such a thing had existed, those rules were never defined -- the expectation was always what we have now documented in the specification more clearly).

Lesson Learned 1: We did not adequately document what each "format" was

What we did notice as a Working Group was that some, understandably, misinterpreted the specification to mean that they could completely ignore the JSON-LD part of the specification, which established a concrete syntax and well-understood data model and they could throw whatever they wanted to into a VC and call it a day. Some implementers treated Verifiable Credentials v1.0-v1.1 as "just JSON" and were frustrated when they deviated from the data model established by the specification (things like the use of id and type in a consistent way) and were then admonished for doing so. This was largely a documentation and languaging issue, which we've endeavored to fix in v2.0.

Lesson Learned 2: We made it too difficult to get started

There is also a sub-community within the Verifiable Credentials community that does not desire well defined semantics through the use of JSON-LD, but would rather use JSON Schema and external documentation, to achieve interoperability. The argument used here is "We don't need something as heavyweight as JSON-LD, we want to use our own mechanism for decentralized semantics". For this sub-community, the Verifiable Credentials Working Group has introduced a "default vocabulary" using the @vocab JSON-LD keyword and "Issuer Dependent" semantics (https://www.w3.org/ns/credentials/issuer-dependent#). Effectively, by default for VCDM v2.0, an issuer can choose to NOT define semantics via JSON-LD and can instead convey those semantics through other mechanisms to their market vertical. These "undefined" or "issuer-dependent" semantics will be clearly identified to anyone doing JSON-LD processing (ensuring no conflict w/ other JSON-LD defined semantics). This approach is further elaborated upon Getting Started and Extensibility.

Verifiable Credential Working Group Scope and Closing Thoughts

As discussed in our closing comment for the TAG review of v1.0, we remain concerned about how much of this ecosystem is outside of the working group's remit. We are limiting the scope of our comments to the data model itself, and they should not be applicable to verifiable credentials more broadly.

There are VC Working Group members that continue to be frustrated at the limited scope of the group, namely that protocols and other application-layer concerns are not in scope. That said, the WG has struggled to work through the large number of REC-track deliverables it has given the standard 24 month timeline.

There continues to be very strong interest in this space, and a growing adoption and deployment by national and state governments and market verticals such as retail, banking/finance, and healthcare in Canada, New Zealand, the United States, and the European Union. All that to say, I doubt this will be the last TAG review on the technology or the ecosystem. :)

Thank you for your time when reviewing the specification. Let us know if you have any further comments or concerns. We plan on attempting to take the Verifiable Credential Data Model v2.0 specification into the Candidate Recommendation phase in January 2024.

rhiaro commented 4 months ago

Thanks for your detailed reply, and patience as we work our way through our backlog. Sorry that we missed your CR deadline.