w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
287 stars 103 forks source link

Define JSON Schemas for all objects in the VCDM #934

Closed decentralgabe closed 1 year ago

decentralgabe commented 1 year ago

We should publish JSON Schemas to represent Verifiable Credentials and Verifiable Presentations to aid with static validation of the data models.

A few questions:

cc: @OR13 @iherman @msporny

iherman commented 1 year ago

We should publish JSON Schemas to represent Verifiable Credentials and Verifiable Presentations to aid with static validation of the data models.

+1

A few questions:

  • Which version of JSON schema shall we use?

Because there is no standard version, I would think we should choose a version that is well backed up by tooling (plugins to usual editors, validator libraries or programs...). I myself use, usually, ajv, which can be used both on a command line as a JS library, and which seems to correspond, by default, to draft-07. But I do not have a complete overview of the Schema landscape, there may be better tools out there.

  • How does this impact the concept of 'verifiability' -- i.e. is a credential valid if the signature verifies but the schema verification fails?

Here is where the missing standard/stability of JSON schema may backfire. A definition like you propose is, essentially, a normative statement on validity. A normative statement of the sort cannot be dependent on a non-standard specification.

I think the only thing that we could/should say is that (1) we provide a non-normative schema to the data model that reflects a JSON-LD serialization of the model and (2) implementers may use in their toolkits to facilitate development and usage.

(I say "a" serialization and not "the" serialization because, in theory, the same model instance may have several different JSON-LD representations.)

SmithSamuelM commented 1 year ago

Cross posting from another issue. As this issue is more relevant to this post>

JSON Schema has some very nice composition operators (anyOf, oneOf) features that allow us to support graduated disclosure and contractually protected disclosure in ACDCs. JSON Schema also have a fairly expressive rules engine for arbitrary composition. If one uses composed JSON Schema then Schema can be immutable (all the allowed variants in a presentation exchange are already committed to in the composition. This locks down the intent of the Issuer and simplifies presentation exhange protocols. Immutable schema then make it much easier to manage VCs from a security perspective because a content addressable identifier can be attached to the schema and any registry or cache of schema can provide a verifiable (integrity) copy of the schema so indentified. Finally one can use JSON Schema as the capture base for other semantic tools such as the ToIP OCA specification which for those that are unfamiliar is a layered schema approach that enables data shaping for the capture base.

Something generic like JSON Schema can be used with other data models like those based on Property Graphs which can be implemented in not merely PG specific databases like Neo4J or Open Cypher or the emerging ISO GQL but existing SQl databases can be repurposed to support PGs. This means that instead of one set of tooling (JavaScript JSON-LD) or RDF we have at our disposal the full suite of tooling widely used by everyone else.

OR13 commented 1 year ago

At TPAC the idea of focusing on OAS, instead of JSON Schema was raised... I am in favor of using OAS3.

See https://swagger.io/specification/

See the section on data types:

Primitive data types in the OAS are based on the types supported by the JSON Schema Specification Wright Draft 00. Note that integer as a type is also supported and is defined as a JSON number without a fraction or exponent part. null is not supported as a type (see nullable for an alternative solution). Models are defined using the Schema Object, which is an extended subset of JSON Schema Specification Wright Draft 00.

SmithSamuelM commented 1 year ago

I haven't reviewed OAS3 at the same level of details as the latest version (2020) of the JSON schema specification. However, this paragraph from the Wright Draft for JSON Schema which is referenced as normative for OAS is compatible with locking down schema as immutable which is a MUST for ACDCs.

" The URI is not a network locator, only an identifier. A schema need not be downloadable from the address if it is a network-addressible URL, and implementations SHOULD NOT assume they should perform a network operation when they encounter a network-addressible URI. "

We use SAIDS as the value of the $ID which the Wright Draft narrows from JSON Schema spec to only be proper URIs not any identifier. But as long as it is not required to be a network locator, only an identifier we could embed a SAID in a proper URI and then strip it out. Its just more work.

Finally its not clear to what extent OAS3 supports the composition operators from JSON Schema 2020. These are a must have for ACDC.

OR13 commented 1 year ago

Just below the quote I provided:

The following properties are taken from the JSON Schema definition but their definitions were adjusted to the OpenAPI Specification.

type - Value MUST be a string. Multiple types via an array are not supported. allOf - Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema. oneOf - Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema. anyOf - Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema. not - Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema. items - Value MUST be an object and not an array. Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema. items MUST be present if the type is array. properties - Property definitions MUST be a Schema Object and not a standard JSON Schema (inline or referenced). additionalProperties - Value can be boolean or object. Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema. Consistent with JSON Schema, additionalProperties defaults to true. description - CommonMark syntax MAY be used for rich text representation. format - See Data Type Formats for further details. While relying on JSON Schema's defined formats, the OAS offers a few additional predefined formats. default - The default value represents what would be assumed by the consumer of the input as the value of the schema if one is not provided. Unlike JSON Schema, the value MUST conform to the defined type for the Schema Object defined at the same level. For example, if type is string, then default can be "foo" but cannot be 1.

OR13 commented 1 year ago

I've found that this is one of the areas where JSON Schema's lack of standardization starts to break... different implementations support different sets of "composability" apis.... You can't consistently rely on them across languages in my experience.

SmithSamuelM commented 1 year ago

Thats not bee our experience so far. We use the oneOf and anyOf composition operators which appear to be very well defined and implemented. There are other composition operators besides those two like the regexable ones that are not so well locked down. But implementations that do not follow the spec are the problem not the spec. The spec is clear what oneOf and anyOf are supposed to do. So arguing about implementation fails is not a reason to not rely on a spec and implementations that correctly implement the spec.

dezell commented 1 year ago

In our (Conexxus) use of OAS 3 and JSON Schema, we found composability was not consistent between various tools. We solved that problem by using "bundling" to build OAS files with no external references (re-performing bundling on any check-in) and we have yet to discover any tool (display, code generation, or other) that has any problem with the bundled version.

Also, as I mentioned in Vancouver, all of our external JSON Schema definitions are ensconced in OAS 3 files, so there are no versioning issues, either. There is the additional benefit of being able to browse those ensconced JSON Schemas with tools at the component level (an added benefit).

SmithSamuelM commented 1 year ago

I think we may be talking about two different types of composition. There is composition via bundling or bundlers and the specific composition operators oneOf and anyOf. One can use oneOf and anyOf on an unbundled schema and subschema or on bundled sub-schema. But the syntax and semantics of those two operators is well defined.

JSON schema bundlers are all over the map and not very mature yet. That is not the type of composition I am talking about.

dezell commented 1 year ago

You're right, there is "oneOf/allOf/anyOf", which may or may not use JSON-Pointer to reference external documents. It's the external reference that was problematic.

And in our case, we chose the bundler, and the users of the finished specification don't have to use bundlers at all unless they choose to do so - we made the decision for them before we published.

SmithSamuelM commented 1 year ago

@decentralgabe

How does this impact the concept of 'verifiability' -- i.e. is a credential valid if the signature verifies but the schema verification fails?

In ACDCs schema is type. So we rigidly require schema validation for ACDC validation not merely signature validation. This level of rigidity supports ACDC strict security properties and guarantees. Schema validation does not need to be a MUST requirement for VCv2 but must be allowed for those applications that choose to adopt a more stricter security stance. As a result schema as type in ACDCs are immutable. This is why we NEED the composition operators 'oneOfandanyOf. With 'oneOf the issuer can constrain the allowed schema for the allowed disclosable variants. This nicely balances the need for an Issuer to constrain compliant disclosures of the ACDC while also allowing disclosers to disclose the variant needed by the context of the disclosure. We use this to support graduated disclosure for contractually protected disclosure. We use the anyOf operator to support selective disclosure. A set of selectively disclosable attributes is expressed as an array of schema composed by the anyOf operator. Thereby any mixture of the dislosure of one or more of the elements of the array will validate against the schema. Thus the issuer can precisely define the types of the selectively disclosable elements while enabling selective disclosure by the eventual discloser. Because the schema are immutable, a signature on the ACDC which includes the SAID of its schema (A SAID is a content addressable identifier i.e. derived from a type of cryptographic digest) makes a commitment to the schema. This eliminates any malleability type attacks on the type of data so disclosed. The type may be as precisely constrained as JSON schema is capable of constraining. This includes the schema defining the types of other ACDCs pointed to by an edge of an ACDC. This makes a graph of ACDCs an immutable strongly typed hash chained signed verifiable data structure that is end-verifiable. No expansion necessary.

SmithSamuelM commented 1 year ago

Moreover because ACDC issuances are anchored via their SAIDs in the Key Event Log of the Issuer (another verifiable data structure) the proof of issuance of an ACDC does not require an attached signature but merely a reference to the anchoring event which is verifiable against the SAID of the issued ACDC. The anchoring event is itself signed but one event may anchor multiple issuances. Because the AID of each Issuer has its own KEL there is no co-mingling of key state. A given AID can be forgotten by deleting its KEL. KERI/ACDC's approach provides an elegant solution to the GDPR problems inherent in public verifiable data registries that co-mingle key state and or revocation state between issuers. Co-mingled state VDRs can't be deleted.

Finally, the SAIDs of ACDCs can be blinded so that the KEL anchors are not linkable without specific disclosure after contractually protected disclosure has been agreed upon by the verifier. Bulk issued ACDCs allow one time use ACDCs so that third parties (non parties to the disclsure) can't correlate without the collusion of second parties in violation of their non-disclosure agreements.

SmithSamuelM commented 1 year ago

Immutable schema identified by SAIDs (cryptographically agile self-addressing identifiers) which are derived from the cryptgraphic digest of the schema, mean that ACDCs need no shared governance type registries for interoperability. Each schema's SAID is universally unique so types of ACDCs can be identified in a universal namespace without collisions. (unless the schema are bitwise identical which means they are the same schema so the types are also identical). Using chained ACDCs enables an Append to Extend policy for custom fields. One does not extend an ACDC by modifying in place a given type of ACDC through the addition of custom fields. On extends an ACDC by chaining it to another ACDC with a custom type that includes the custom fields. So the result is always immutable. It's an append only data structure This is how persistent data types in Clojure and other functional languages enable distributed applications without semaphore locking or coordination. An existing data structure is never modified it is just extended by appending. This enables persistent data structures to have very strong data integrity guarantees than can not be had by non-persistent data structures. One can incrementally build a graph database while being able to detect and recover from any partial deletion attacks on that database.

SmithSamuelM commented 1 year ago

One of the problems with dynamically reconstructing serializations from a non-persistent data structure is that it is difficult indeed pathologically so, to detect and recover from partial deletion attacks of the non-persistent data structure. One can not build truely zero-trust architectures unless one has signed at rest persistent data structures or at least keeps around the artifacts needed to prove that the data was not subject to a partial deletion attack. Data integrity of a given node in a database provides evidence of tampering of that node only. To ensure that the node is not merely tamper evident but also securely attributed requires a signature which includes a hash so one gets tamper evident for free. But to ensure that a graph data structure is fully securely attributed and has not had a partial deletion attack requires that the edges also be protected. ACDCs solve this problem by communicating graphs as signed hash chained graph fragments that include the edges for any node. Because DAGs can be ordered, one can hash a DAG and store that hash redundantly to protect against a partial deletion attack. Storing the securely attributable graph fragments redundantly, allows one to repair (i.e. recover from) a partial deletion attack. AFAIK ACDCs are the first and only open standard fully decentralized approach that solves the zero-trust signed-at-rest problem for verifiable graph data structures.

jan-forster-cognizone commented 1 year ago

Dear All,

I tried to reach out regarding the schema definition in another issue (#941), and it was forwarded here.

From reading this issue and the #76 I understand that no formal schema for the VC model exists. Is there any non-standard schema available for sharing?

The current documentation, which relies on examples and textual descriptions of constraints, is not always sufficiently clear for implementation against it.

Many thanks for any advice.

Best regards, Jan Förster

aljones15 commented 1 year ago

These are bit early, but we have existing json schemas for Verifiable Credentials and Verifiable Presentations here:

https://github.com/digitalbazaar/bedrock-validation/blob/main/schemas/verifiableCredential.js

and here:

https://github.com/digitalbazaar/bedrock-validation/blob/main/schemas/verifiablePresentation.js

We're also using ajv: ^6.0.0 which does not conform to the latest version of the json schema spec (I don't think it supports prefixItems yet)

I'm not volunteering to write these for the group, but just to +1 the creation of communal JSON Schemas we can all use as the interoperability tests are highly dependent on them.

On Mon, Oct 17, 2022 at 9:04 AM jan-forster-cognizone < @.***> wrote:

Dear All,

I tried to reach out regarding the schema definition in another issue (

941 https://github.com/w3c/vc-data-model/issues/941), and it was

forwarded here.

From reading this issue and the #76 https://github.com/w3c/vc-data-model/issues/76 I understand that no formal schema for the VC model exists. Is there any non-standard schema available for sharing?

The current documentation, which relies on examples and textual descriptions of constraints, is not sufficiently clear for implementation against it.

Best regards, Honza Förster

— Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/934#issuecomment-1280828963, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACD6CFHYIRUDPE6N7D4PTLWDVFGVANCNFSM6AAAAAAQOXOGD4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jan-forster-cognizone commented 1 year ago

Dear @aljones15, many thanks for the files. At first glance, they seem quite small, but we will assess them further. Thank you again.

Best regards, Jan Förster

sbutterfield commented 1 year ago

+1 to this Happy to help you if needed @decentralgabe

decentralgabe commented 1 year ago

thanks @sbutterfield I will put out a draft PR and add you as a reviewer with @David-Chadwick

iherman commented 1 year ago

The issue was discussed in a meeting on 2022-11-09

View the transcript #### 2.1. Define JSON Schemas for all objects in the VCDM (issue vc-data-model#934) _See github issue [vc-data-model#934](https://github.com/w3c/vc-data-model/issues/934)._ **Gabe Cohen:** 934 says it must define it for all terms. Seems there is back and forth discussion on interested and on how to do this. Seems like there is general consensus, but reading from `@context` thread it seems there is no normative way to refer to it.. … happy to start working on it, but want to make sure there is consensus. **Manu Sporny:** +1 to specifying json schemas for core datamodel, unfortunately there is no standard we can normatively point to but it is being worked out. But this doesn't stop us from defining useful stuff for developers.. … We can say here is OpenAPI 3.0 or 3.1 and we can point to it and make it non-normative and devs have options to use it, making it helpful for them. We just cannot normatively say you must use OAS 3.1 to work with the property. Also need to be careful that we don't close door to other schema formats.. … this issue is about JSON schema for core data model, which will always be limited, which doesn't cover extension points. In general +1.. **Kristina Yasuda:** we've had this at TPAC and in other times, We can't normatively mandate, but direction wise we're good to start assuming that limitation.. **Gabe Cohen:** happy to take this on as a work item will need some assistance on language but ready to work on first draft..
decentralgabe commented 1 year ago

Draft here https://github.com/w3c/vc-data-model/pull/977

msporny commented 1 year ago

PR #977 has been merged, addressing this issue, closing.