w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
299 stars 106 forks source link

Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? #929

Closed brentzundel closed 2 years ago

brentzundel commented 2 years ago

Tracking Issue from TPAC: What does JSON-LD compatible JSON mean?

Please use comments to make concrete proposals.

dlongley commented 2 years ago

First attempt at getting things started...

The core data model should be RDF, but serialized using a profile of JSON-LD that is idiomatic JSON. This should be a contextualized, compact form that could be easily checked against a JSON schema and then consumed by JSON developers that are otherwise unfamiliar with JSON-LD.

nadalin commented 2 years ago

First attempt at getting things started...

The core data model should be RDF, but serialized using a profile of JSON-LD that is idiomatic JSON. This should be a contextualized, compact form that could be easily checked against a JSON schema and then consumed by JSON developers that are otherwise unfamiliar with JSON-LD.

What would be the requirement for RDF from a data model perspective, I don't see anything in the data model that would require RDF

dlongley commented 2 years ago

The data model today is essentially subject-property-value statements -- and containers / wrappers around those statements (aka "graphs"). This is essentially RDF so we should just reuse it, it's a standard (if this buys us having to write some text).

msporny commented 2 years ago

A cut at some language that might provide some clarity around what "JSON-LD compatible JSON" means. The Verifiable Credentials Data Model is "JSON-LD compatible JSON", which means the following:

In order to make development easier for developers coming from a JSON background, we might consider:

The above makes the data model crystal clear, is compatible with all known JSON-LD processors, helps developers coming from a purely JSON background to get started quickly, and retains JSON-only processing modes that are compatible with JSON-LD. Some concrete proposals that we could put in front of the group are:

PROPOSAL: Verifiable Credentials MUST utilize the @context parameter where the values SHOULD be URLs.

PROPOSAL: Verifiable Credentials SHOULD NOT utilize inline JSON-LD Contexts (objects as values) for the @context.

PROPOSAL: Verifiable Credentials MUST be expressed in JSON-LD Compact form.

PROPOSAL: The underlying data model for Verifiable Credentials is JSON-LD.

And the proposals to help make development easier for developers coming from a JSON background:

PROPOSAL: As an initial iteration on the idea, the Verifiable Credentials specification will define an "experimental" JSON-LD Context (https://www.w3.org/2018/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process.

PROPOSAL: A conforming processor SHOULD raise an error if a VC utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment. The definition of "production environment" is left as an exercise to the implementer.

/cc @OR13 @tplooker @mprorock @peacekeeper @philarcher @mkhraisha @dlongley @brentzundel @Sakurann

OR13 commented 2 years ago

@msporny Thank you for taking the time to write this up!

That is, in-line JSON-LD Contexts are strongly discouraged (we might even want to go as far as forbidding them).

-1 to this guidance... it also contradicts conversations with schema.org / google regarding usage of JSON-LD on web pages.

Suggest the working group NOT provide guidance of this form in a W3C TR.

JSON-LD Compact Form is the only allowed form of JSON-LD. JSON-LD expanded form is disallowed to eliminate the requirement to always perform JSON-LD processing in processing pipelines where its not needed.

+1 to this.

The underlying data model is JSON-LD, which is a superset of (and round-trippable to/from) RDF.

+1 to this.

The Verifiable Credentials specification will provide an "experimental" JSON-LD Context (https://www.w3.org/ns/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process. Implementations SHOULD reject verification of any VC that utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment.

-1 to this, counter offer:

{
  "@context": [
    // "https://www.w3.org/2018/credentials/v1",
    "https://www.w3.org/ns/credentials/v2",
    // "https://www.w3.org/2018/credentials/examples/v1"
    { "@vocab": "https://www.w3.org/ns/credentials#" } 
  ],
  "id": "http://example.edu/credentials/1872",
  "type": ["VerifiableCredential", "NewCredentialType"],
  "issuer": { 
    "id": "did:example:123", 
     "type": ["Organization", "OrganizationType"] 
   },
  "issuanceDate": "2010-01-01T19:23:24Z",
  "credentialSubject": {
    "id": "did:example:456", 
    "type": ["Person", "JobType"],
    "claimName": "claimValue"
  }
}

https://github.com/w3c/vc-data-model/issues/935

PROPOSAL: Verifiable Credentials MUST utilize the @context parameter where the values SHOULD be URLs.

+1

PROPOSAL: Verifiable Credentials SHOULD NOT utilize inline JSON-LD Contexts (objects as values) for the @context.

-1 (with serious concerns about how this will cripple adoption in certain use cases).

Suggest the working group NOT provide guidance of this form in a W3C TR.

PROPOSAL: Verifiable Credentials MUST be expressed in JSON-LD Compact form.

+1

PROPOSAL: The underlying data model for Verifiable Credentials is JSON-LD.

+1 (noting that we don't need to propose this, @context is required in v1.1 and the data model is JSON).

PROPOSAL: As an initial iteration on the idea, the Verifiable Credentials specification will define an "experimental" JSON-LD Context (https://www.w3.org/2018/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process.

-1 to "experimental"... but this proposal could probably be restructured in a way that I would accept... see my counter offer.

Suggest the working group NOT provide guidance of this form in a W3C TR.

PROPOSAL: A conforming processor SHOULD raise an error if a VC utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment. The definition of "production environment" is left as an exercise to the implementer.

-1 to this.

Suggest the working group NOT provide guidance of this form in a W3C TR.

msporny commented 2 years ago

@OR13 (and anyone else that weighs in) could you please re-edit your comment above and 1) explain your -1s in more depth, and ideally, 2) provide counter-proposals for all -1s. It will help us figure out areas where it might be possible to reach consensus. Like the @vocab / "experimental" thing seems close... but the schema.org thing feels like it needs a lot more discussion (there's history there that much of the WG is probably missing).

OR13 commented 2 years ago

edited, mostly my counter proposals are "don't say this in a W3C TR."... leave the power in the hands of the developers / system builders and users.

dlongley commented 2 years ago

+1 to all the proposals in https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267030853.

selfissued commented 2 years ago

It’s time to let JSON be JSON

The Verifiable Credentials spec currently has conflicting guidance about the use of @context when the VC is not JSON-LD. On one hand, Section 6.1 (JSON) describes the pure JSON representation without making any requirements to use @context. On the other hand, Section 4.1 (Contexts) says “Verifiable credentials and verifiable presentations MUST include a @context property.” Later in the same section it says “Though this specification requires that a @context property be present, it is not required that the value of the @context property be processed using JSON-LD. This is to support processing using plain JSON libraries”.

Yes, it’s clear what @context means when the VC is JSON-LD. It’s also very unclear what @context means when the VC is pure JSON.

Now that we have experience with deployments of Verifiable Credentials, it’s clear that many developers don’t know how to use @context. As a result, they’ve deployed non-interoperable VCs. As Joe Andrieu said during TPAC in Vancouver, “If we didn’t have Dave Longley as a resource to help us, we wouldn’t have known how to get @context right.” Joe’s far from alone.

Proposal

Modify Section 4.1 (Contexts) to say that @context MUST be present when the VC is JSON-LD and that @context MUST NOT be present when the VC is JSON but not JSON-LD. This has multiple benefits for developers:

  1. When @context is present, it’s an unmistakable indication that the VC is JSON-LD and all JSON-LD processing rules apply.
  2. When @context is not present, it’s an unmistakable indication that the VC is not JSON-LD and no JSON-LD processing rules apply.
  3. When @context is not present, developers do not bear the complexity burden of JSON-LD.

Answering the Issue Question

The tracking issue asked the question “What does JSON-LD compatible JSON mean?”. Given the proposal above, the answer is crystal clear: JSON-LD compatible JSON means JSON-LD; JSON that is not compliant with JSON-LD is not JSON-LD compatible JSON.

TallTed commented 2 years ago

@selfissued -- It's time to let @context be @context. Please edit your latest comment, and wrap all 13 occurrences of @context in code fences. That GitHub user is not part of any relevant groups, and does not need to be pinged every time a comment is made on this thread.

Also, please note that, in fact, JSON that uses URIs for all terms, thus requiring no mapping from "simple" term literals to URIs via @context, is also "JSON-LD compatible JSON". I think this "JSON-LD compatible JSON" would impose no "complexity burden of JSON-LD" (by which I think you actually mean a "complexity burden of @context") because there is no @context and no term mapping; each term URI should be interpreted and maintained exactly as written.

nadalin commented 2 years ago

There is absolutely no reason to have or process a “@ context” when dealing with JWTs, it requires extra work for parsers that ONLY process JSON to process the “@ context”, its time to remove this requirement and let JSON be JSON and not force these processing rules on deployments that don’t want to use JSOON-LD, lets make things simple.

From: Ted Thibodeau Jr @.> Sent: Tuesday, October 4, 2022 5:22 PM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Comment @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

@selfissued https://github.com/selfissued -- It's time to let @context be @context. Please edit your latest comment, and wrap all 13 occurrences of @context in code fences. That GitHub user is not part of any relevant groups, and does not need to be pinged every time a comment is made on this thread.

Also, please note that, in fact, JSON that uses URIs for all terms, thus requiring no mapping from "simple" term literals to URIs via @context, is also "JSON-LD compatible JSON". I think this "JSON-LD compatible JSON" would impose no "complexity burden of JSON-LD" (by which I think you actually mean a "complexity burden of @context") because there is no @context and no term mapping; each term URI should be interpreted and maintained exactly as written.

— Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267758382 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4R4Y5OQE6E7VT3K4DPQ5DWBTC2VANCNFSM6AAAAAAQNWBJHE . You are receiving this because you commented. https://github.com/notifications/beacon/AB4R4Y4B7IDYRWTUJKSOGJLWBTC2VA5CNFSM6AAAAAAQNWBJHGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLSB2S4.gif Message ID: @. @.> >

nadalin commented 2 years ago

-1 to this approach

From: Manu Sporny @.> Sent: Tuesday, October 4, 2022 6:49 AM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Comment @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-compatible JSON-LD mean? (Issue #929)

A cut at some language that might provide some clarity around what "JSON-LD compatible JSON" means. The Verifiable Credentials Data Model is "JSON-LD compatible JSON", which means the following:

In order to make development easier for developers coming from a JSON background, we might consider:

The above makes the data model crystal clear, is compatible with all known JSON-LD processors, helps developers coming from a purely JSON background to get started quickly, and retains JSON-only processing modes that are compatible with JSON-LD. Some concrete proposals that we could put in front of the group are:

PROPOSAL: Verifiable Credentials MUST utilize the @context parameter where the values SHOULD be URLs. PROPOSAL: Verifiable Credentials SHOULD NOT utilize inline JSON-LD Contexts (objects as values) for the @context. PROPOSAL: Verifiable Credentials MUST be expressed in JSON-LD Compact form. PROPOSAL: The underlying data model for Verifiable Credentials is JSON-LD.

And the proposals to help make development easier for developers coming from a JSON background:

PROPOSAL: As an initial iteration on the idea, the Verifiable Credentials specification will define an "experimental" JSON-LD Context (https://www.w3.org/2018/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process. PROPOSAL: A conforming processor SHOULD raise an error if a VC utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment. The definition of "production environment" is left as an exercise to the implementer.

— Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267030853 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4R4Y42AMW43ADTSYEBWUDWBQYT7ANCNFSM6AAAAAAQNWBJHE . You are receiving this because you commented. https://github.com/notifications/beacon/AB4R4Y6PS7DGDK2A6CNMY4TWBQYT7A5CNFSM6AAAAAAQNWBJHGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLQVNUK.gif Message ID: @. @.> >

msporny commented 2 years ago

@selfissued wrote:

@context MUST NOT be present when the VC is JSON

@nadalin wrote:

There is absolutely no reason to have or process a @context

Could either of your please explain what the extensibility model is for your proposal? How are global semantics achieved?

In the past, the answer has been some variation of:

Is something new being offered this time around?

David-Chadwick commented 2 years ago

Yes we can do this if JSON implementations conform to the current W3C DM recommendation and use URIs everywhere where URIs are required. But if you want to use simple alias strings instead of URIs then you have to have an @context that specifies the mappings. What you appear to want is simple aliases with no mappings to URIs so that there is ambiguity over what the alias property is - as we introduced 30 years ago when we moved from X.500 to LDAP. One would have thought we could have learned the lesson from that bad move and stand on the shoulders of giants.

David-Chadwick commented 2 years ago

This sounds like a broken proposal to me. Since we have issuers, holders/wallets and verifiers, each of which may implement either JSON or JSON-LD processing rules (and have no requirement to implement both) then your proposal is a guarantee of non-interworking. The current spec has several ambiguities in it, this is already acknowledged (e.g. whether to duplicate or replace vc claims in JWT proofed VCs). The solution is to clarify the ambiguities in order to ensure interworking when passing a VC between JSON and JSON-LD implementations. It is not to guarantee to break implementations when doing this.

nadalin commented 2 years ago

The lesson learned is to keep simple things simple. Don't complicate things if I have a JWT I should be able to process it as it is including SD JWTs without the use of at context being there. Keeping "@ context" optional gives you the extensibility and does not break any existing implementations.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: David Chadwick @.> Sent: Wednesday, October 5, 2022 4:02:52 AM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Mention @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

Yes we can do this if JSON implementations conform to the current W3C DM recommendation and use URIs everywhere where URIs are required. But if you want to use simple alias strings instead of URIs then you have to have an @context that specifies the mappings. What you appear to want is simple aliases with no mappings to URIs so that there is ambiguity over what the alias property is - as we introduced 30 years ago when we moved from X.500 to LDAP. One would have thought we could have learned the lesson from that bad move and stand on the shoulders of giants.

Kind regards David

On 05/10/2022 02:24, Anthony Nadalin wrote:

There is absolutely no reason to have or process a “@ context” when dealing with JWTs, it requires extra work for parsers that ONLY process JSON to process the “@ context”, its time to remove this requirement and let JSON be JSON and not force these processing rules on deployments that don’t want to use JSOON-LD, lets make things simple.

From: Ted Thibodeau Jr @.> Sent: Tuesday, October 4, 2022 5:22 PM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Comment @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

@selfissued https://github.com/selfissued -- It's time to let @context be @context. Please edit your latest comment, and wrap all 13 occurrences of @context in code fences. That GitHub user is not part of any relevant groups, and does not need to be pinged every time a comment is made on this thread.

Also, please note that, in fact, JSON that uses URIs for all terms, thus requiring no mapping from "simple" term literals to URIs via @context, is also "JSON-LD compatible JSON". I think this "JSON-LD compatible JSON" would impose no "complexity burden of JSON-LD" (by which I think you actually mean a "complexity burden of @context") because there is no @context and no term mapping; each term URI should be interpreted and maintained exactly as written.

— Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267758382 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4R4Y5OQE6E7VT3K4DPQ5DWBTC2VANCNFSM6AAAAAAQNWBJHE . You are receiving this because you commented. https://github.com/notifications/beacon/AB4R4Y4B7IDYRWTUJKSOGJLWBTC2VA5CNFSM6AAAAAAQNWBJHGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLSB2S4.gif Message ID: @. @.> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267797055", "url": "https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267797055", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

— Reply to this email directly, view it on GitHubhttps://github.com/w3c/vc-data-model/issues/929#issuecomment-1268280926, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB4R4Y2C4745TIA37MKQ5DTWBVN5ZANCNFSM6AAAAAAQNWBJHE. You are receiving this because you were mentioned.Message ID: @.***>

David-Chadwick commented 2 years ago

"@context" cannot be optional unless you replace all the existing properties (type, id, issuanceDate etc) with their full URIs. Otherwise you have a non-conformant VC.

nadalin commented 2 years ago

There would be no claim transformations required if there is no "@ context" . Something else to keep simple.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: David Chadwick @.> Sent: Wednesday, October 5, 2022 5:41:38 AM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Mention @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

@context cannot be optional unless you replace all the existing properties (type, id, issuanceDate etc) with their full URIs. Otherwise you have a non-conformant VC. Kind regards David

On 05/10/2022 13:18, Anthony Nadalin wrote:

The lesson learned is to keep simple things simple. Don't complicate things if I have a JWT I should be able to process it as it is including SD JWTs without the use of at context being there. Keeping "@ context" optional gives you the extensibility and does not break any existing implementations.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: David Chadwick @.> Sent: Wednesday, October 5, 2022 4:02:52 AM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Mention @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

Yes we can do this if JSON implementations conform to the current W3C DM recommendation and use URIs everywhere where URIs are required. But if you want to use simple alias strings instead of URIs then you have to have an @context that specifies the mappings. What you appear to want is simple aliases with no mappings to URIs so that there is ambiguity over what the alias property is - as we introduced 30 years ago when we moved from X.500 to LDAP. One would have thought we could have learned the lesson from that bad move and stand on the shoulders of giants.

Kind regards David

On 05/10/2022 02:24, Anthony Nadalin wrote:

There is absolutely no reason to have or process a “@ context” when dealing with JWTs, it requires extra work for parsers that ONLY process JSON to process the “@ context”, its time to remove this requirement and let JSON be JSON and not force these processing rules on deployments that don’t want to use JSOON-LD, lets make things simple.

From: Ted Thibodeau Jr @.> Sent: Tuesday, October 4, 2022 5:22 PM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Comment @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

@selfissued https://github.com/selfissued -- It's time to let @context be @context. Please edit your latest comment, and wrap all 13 occurrences of @context in code fences. That GitHub user is not part of any relevant groups, and does not need to be pinged every time a comment is made on this thread.

Also, please note that, in fact, JSON that uses URIs for all terms, thus requiring no mapping from "simple" term literals to URIs via @context, is also "JSON-LD compatible JSON". I think this "JSON-LD compatible JSON" would impose no "complexity burden of JSON-LD" (by which I think you actually mean a "complexity burden of @context") because there is no @context and no term mapping; each term URI should be interpreted and maintained exactly as written.

— Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267758382 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4R4Y5OQE6E7VT3K4DPQ5DWBTC2VANCNFSM6AAAAAAQNWBJHE . You are receiving this because you commented. https://github.com/notifications/beacon/AB4R4Y4B7IDYRWTUJKSOGJLWBTC2VA5CNFSM6AAAAAAQNWBJHGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSLSB2S4.gif Message ID: @. @.> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267797055", "url": "https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267797055", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

— Reply to this email directly, view it on GitHubhttps://github.com/w3c/vc-data-model/issues/929#issuecomment-1268280926, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB4R4Y2C4745TIA37MKQ5DTWBVN5ZANCNFSM6AAAAAAQNWBJHE. You are receiving this because you were mentioned.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268360878", "url": "https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268360878", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

— Reply to this email directly, view it on GitHubhttps://github.com/w3c/vc-data-model/issues/929#issuecomment-1268383982, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB4R4Y35DY7GGZ5I4XIMXOTWBVZQFANCNFSM6AAAAAAQNWBJHE. You are receiving this because you were mentioned.Message ID: @.***>

msporny commented 2 years ago

@David-Chadwick wrote:

This sounds like a broken proposal to me.

Yes, agreed. Removing a feature that provides global interoperability and then replacing it with nothing will lead to non-interoperability; thus, it seems a non-solution is being proposed.

"let JSON be JSON" and "keep things simple" are sound bites. Being generous, they could be construed as design guidelines; it is not a workable technical architecture.

Please answer the question being asked instead of stating that we do not need a feature that has achieved consensus (multiple times) over many years. To re-state the question:

Could either of your please explain what the extensibility model is for your proposal? How are global semantics (and thus, interop) achieved?

dlongley commented 2 years ago

I agree with "let's keep simple things simple". They should be as simple as they can be -- but not simpler. Here's my perspective on that:


If all you want to do is share a couple of very common fields of information between a closed or mostly closed system of well-known data providers, use a standard that doesn't place constraints on data modeling.

If you don't care about data providers tracking your users behavior or want this property so that you can monetize it, or if you generally don't care about the "phone home problem", use a standard that doesn't have the extra features designed to work against this.

If you don't mind using a centralized registry for achieving interop with how you modeled your data or your data is simple enough so that this solution meets your scaling needs, don't use a standard that introduces a decentralized mechanism.

If you don't need to be able to atomize your data into simple statements, no matter how they are nested, so that they can be merged and linked with other data to build powerful knowledge graphs, or selectively disclosed, don't use a standard that requires you to apply constraints to how you model your data.

If your use cases fit into the above -- use a JWT. JWTs aren't opinionated on the data model, just the data format: JSON. They don't impose any extra constraints to achieve additional use cases -- because they aren't designed for those and you, in particular, don't need them.


Now, what about everyone else?


If you have more than a few common fields of information (perhaps you even have rich and well-connected data) and you want it to be easily shareable and usable across an open ecosystem where you do not even know who might consume it, use a standard that places a simple, minimum set of constraints on data modeling to enable this to happen.

If you care about privacy issues with data providers tracking your users behavior and want to stop the "phone home problem", use a standard designed to help achieve this.

If you don't want to use a centralized registry for defining your data or if that solution doesn't scale to meet your needs, use a standard that defines a decentralized mechanism.

If you want your data to be mergeable and linkable with other data to build powerful knowledge graphs via interoperable, common tooling, or selectively disclosed, use a standard that requires all data to be modeled in a common way, using the simplest constraints to achieve this goal.

If your use cases fit into this section here, use a VC.


VCs specify a data model, not just a data format. The data model, when expressed in JSON, says that each object is a "thing" with properties. The properties are expressed as JSON keys and link to other values or other things -- where the same rules then repeat. It is true that you cannot just "do whatever you want" when modeling your data, because then there is no common structure for interoperable tools to work with; instead, all the data is bespoke and looks different, which fails to meet the requirements. This data model is the simplest set of constraints to understand and apply when modeling your data -- to achieve the above requirements. And that's why we should do it: it keeps things as simple as they can be, but not simpler.

VCs provide a decentralized registry mechanism called @context, borrowed from another standard, JSON-LD. This mechanism is the simplest, standard way to map simple JSON keys onto globally unambiguous URLs.

So, yes, let's keep simple things simple, but not simpler. That includes making sure we understand that we have a common data model and a decentralized registry mechanism in order to meet requirements that these use cases have. For those use cases that don't have any of these requirements, you don't need to use VCs. Use something like a JWT -- that's a technology that's been around for over a decade. If your use case can be solved with it -- do it. If not, and you've been waiting for over a decade for a standard that provides the additional features you need to solve your use case, VCs may be for you. But it is not simpler to try to change VCs to look like another standard that already exists but is too simple to meet the requirements. That only creates two standards that address the same use cases -- and that both fail to specify what people need to achieve interoperability on other ones.

nadalin commented 2 years ago

I have answered your question so please don't be hostel

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Manu Sporny @.> Sent: Wednesday, October 5, 2022 7:32:24 AM To: w3c/vc-data-model @.> Cc: Anthony Nadalin @.>; Mention @.> Subject: Re: [w3c/vc-data-model] Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? (Issue #929)

@David-Chadwickhttps://github.com/David-Chadwick wrote:

This sounds like a broken proposal to me.

Yes, agreed. Removing a feature that provides global interoperability and then replacing it with nothing will lead to non-interoperability; thus, it seems a non-solution is being proposed.

"let JSON be JSON" and "keep things simple" are sound bites. Being generous, they could be construed as design guidelines; it is not a workable technical architecture.

Please answer the question being asked instead of stating that we do not need a feature that has achieved consensus (multiple times) over many years. To re-state the question:

Could either of your please explain what the extensibility model is for your proposal? How are global semantics (and thus, interop) achieved?

— Reply to this email directly, view it on GitHubhttps://github.com/w3c/vc-data-model/issues/929#issuecomment-1268519594, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB4R4Y5KJKBBYAJAB5UZEIDWBWGPRANCNFSM6AAAAAAQNWBJHE. You are receiving this because you were mentioned.Message ID: @.***>

TallTed commented 2 years ago

@David-Chadwick, @nadalin — Please revisit your comments, above, and

  1. remove all unnecessarily quoted content, which makes it harder to digest your comments
  2. codefence all remaining occurrences of @context and any other @ entities, as it's well beyond unfriendly to constantly ping GitHub users who are not otherwise participating in our conversations!
selfissued commented 2 years ago

@msporny wrote:

Could either of your please explain what the extensibility model is for your proposal? How are global semantics achieved?

The extensibility model is the normal JSON one: Add fields as you need them. If you want them to be globally interoperable use collision-resistant names or register the names in the appropriate claims registry. This model is described at https://datatracker.ietf.org/doc/html/rfc7519#section-4 and implemented in the registry https://www.iana.org/assignments/jwt/jwt.xhtml#claims. We can and likely will have a similar registry for interoperable VC claims. This is all normal JSON interop stuff.

OR13 commented 2 years ago

See also:

Screen Shot 2022-10-05 at 10 33 52 AM

I'm firmly against, any changes that causes interoperability issues for regulators verifying credentials in VC-JWT or Data Integrity form... some of the proposals in this thread are heading that direction, and I don't believe the W3C is the right place for that work.

At IETF, we have the ability to sign arbitrary data as JSON or CBOR... We don't need another way to do this at W3C.

iherman commented 2 years ago

The issue was discussed in a meeting on 2022-10-05

View the transcript ### 3. Concrete Proposals for Core Data Model. _See github issue [vc-data-model#929](https://github.com/w3c/vc-data-model/issues/929)._ **Manu Sporny:** attempt at a number of proposals - some discussion over those proposals with a couple of +1's, some back-and-forth with Orie, wondering if the best approach is to put proposals forward to see if there's agreement. > *Manu Sporny:* See [proposal](https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267030853). > *Michael Jones:* +1 to make VCs easier to use and develop. **Manu Sporny:** there are a number of things which make JSON-LD processing mandatory, one idea here would be to not allow JSON-LD expanded form - only bald JSON-LD is one in compacted form. > *Michael Prorock:* +1 to easier to use and develop. > *Ivan Herman:* See [JSON-LD compact form definition](https://www.w3.org/TR/json-ld11/#compacted-document-form). **Manu Sporny:** another, have `@context` use URLs and recommend against inline contexts. > *Michael Prorock:* -1 to guidance against inline context - see comments in issue re twitter and other discussions on schema.rg. **Manu Sporny:** finally, a discussion about `@vocab` and make it easier for developers to pick up and use without the first thing to do being to define a JSON-LD document.. … a suggestion by Orie that its RDF, my proposal is it is JSON-LD which is a superset of RDF, others that it is JSON-only. > *Michael Jones:* My proposal titled "It's time to let JSON be JSON" is at [https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267697526](https://github.com/w3c/vc-data-model/issues/929#issuecomment-1267697526). **Michael Jones:** after discussions with a lot of people including at TPAC, there is ample evidence that there are developers who get it wrong if they use `@context`, and who would be happy with a more typical JSON model. The current text is halfway between, requiring an `@context` without requiring JSON-LD. The simplest way to resolve this is to define two kinds of credentials - ones which include `@context` and are JSON-LD, and ones which don't and are JSON. > *Gabe Cohen:* +1 Mike. **Michael Jones:** it is a little unfortunate if we have two representations, but that is what we are seeing. We should restrict the usage of `@context` if the data is not JSON-LD, and require it if it is. > *Jeremie Miller:* +1 to two clearly different kinds, w/ `@context` and without. **Orie Steele:** appreciate comment about developers, various skillsets mean that some struggle with certain technologies while others find it easier. We should strive to make it easier to implement for unskilled developers.. > *Michael Prorock:* +1 orie - ietf vs w3c and a place for all things. > *Joe Andrieu:* +1 to point out that restricting context would be a violation of JWT's extensibility framework. > *Shawn Butterfield:* +100 Orie. > *Manu Sporny:* -1 (splitting into two different formats) that will guarantee a non-interoperable VC ecosystem.. > *Dave Longley:* +1 to Orie. > *Michael Prorock:* -1 to splitting into two different formats. > *Dave Longley:* -1 to splitting into two different formats, if don't want data model constraints and open world decentralized semantics, use a JWT -- that already exists.. > *Michael Prorock:* +1 semantics are important to this work. > *Manu Sporny:* +1 on semantics being important and are a key differentiator here.. **Orie Steele:** COSE/JOSE work in IETF, have their place in signing unstructured data. To the original point on implementation complexity, should be trivial to implement but should have value in implementing. Combining things together means that they lose the value of their specificity. My value in Verifiable credentials is that they provide semantic data.. > *Michael Prorock:* [https://lists.w3.org/Archives/Public/public-credentials/2022Sep/0253.html](https://lists.w3.org/Archives/Public/public-credentials/2022Sep/0253.html). **Orie Steele:** example of a mill test report signed by a steel company in Mexico - want them to choose between using VC-JWT or Data Integrity - but regulators consuming the document should have the same semantic data at the end. … mission we are on is to create an open world model for structured semantic data, treating this work as an extension of COSE/JOSE with a few new terms doesn't solve these objectives or help issuers and verifiers.. > *Manu Sporny:* +1 to what Orie is saying.. > *Dave Longley:* +1 to Orie. > *Michael Prorock:* +1 Orie. > *David Chadwick:* +1 to Orie (or plus infinity). **Joe Andrieu:** comment on selfissued's comment - if the extensibility model is to just add whatever terms to the JSON to extend it, why not allow `@context`.. > *Orie Steele:* Example of awesome work at IETF, on signing arbitrary data... [https://datatracker.ietf.org/doc/html/draft-ietf-cose-countersign](https://datatracker.ietf.org/doc/html/draft-ietf-cose-countersign). > *Michael Jones:* My comment on the JSON-only extensibility model is at [https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268585146](https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268585146). > *Orie Steele:* +1 to adding @vocab to the core data models v2 context.. **Michael Prorock:** applause to Orie on comments, a reasonable proposal is that `@context` is required given the semantic nature of work - however, it is important to recognize that there are a large body of use cases existing in the wild that also utilize some of the properties of JSON-LD like `@vocab`. > *Manu Sporny:* -1 to adding `@vocab` in core context, but +1 to add it in a "poc/developer" context.. > *Dave Longley:* +1 to adding `@vocab` in some way that makes it easy for less skilled developers to use, not necessarily in the core context, but perhaps in another context that can be used and will signal its usage to simple processors (that just read the `@context` strings). > *Michael Prorock:* I should also note that `@vocab` prevents developer errors in terms of what is getting signed or not. > *Manu Sporny:* It creates errors as well -- :). > *Manu Sporny:* However, there is a way to address this concern and we shouldn't conflate that discussion w/ the core data model discussion.. **Michael Jones:** not trying to change JSON extensibility model, as it is a claim with specific meaning that could conflict. We should register it as a claim in the IANA JWT claim registry. You need to use claims in the way they are registered. If you use `@context`, use it as it is defined.. > *Orie Steele:* +1 to registering JWT claims, -1 to thinking that IANA registries are the only way to understand claims... we are literally here to break that cycle.. > *Michael Prorock:* +1 orie. > *Manu Sporny:* +1 orie. > *Joe Andrieu:* +1 to letting `@context` be used as intended, and allowed anywhere in the JSON serialization. > *Dave Longley:* +1 to Orie. > *Orie Steele:* luckily we don't need a new standard to "just sign JSON or CBOR" :). > *Manu Sporny:* nor do we need a new JWT spec, it's there, if people don't want semantics -- use that. :). > *Orie Steele:* don't forget about signing with "sd-jwt" :). > *Manu Sporny:* or jwp! :). > *Orie Steele:* or acdcs. > *Manu Sporny:* or AnonCreds. **Michael Jones:** Orie made a point that signing things should be a distinct activity from the type of data which is signed - we are defining what is signed, whether with JOSE, COSE, Data Integrity. The value we are adding is in defining the additional claims which are in a typical VC, and what they mean. > *Dave Longley:* my comments here: [https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268527033](https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268527033) <-- use the right tool for your use case ... if that's a JWT, use one, if it's a VC, use a VC ... but these aren't the same things and shouldn't be made to be the same.. > *Orie Steele:* this is why we shouldn't let the core data model be "wagged" by a security format.. > *Kristina Yasuda:* JWT spec cannot be used as-is to do a JSON-encoded VC. > *Orie Steele:* ? we use it in 1.1... not sure what you mean kristina.. > *Michael Prorock:* there is a 3rd proposal on the table: `@context` + `@vocab` in core data model. > *Manu Sporny:* [https://w3c-ccg.github.io/traceability-vocab/#credentials](https://w3c-ccg.github.io/traceability-vocab/#credentials). **Manu Sporny:** the problem here is that we are discussing splitting the ecosystem into two communities with different extensibility models. JSON-LD uses identifiers which do not require a central registry, while JSON would defines claims in a centralized registry.. > *Orie Steele:* Don't look at us... look at schema.org, GS1, UN CEFACT, CHEBI, QUDT, FIBO, etc.... **Manu Sporny:** if you just look at the traceability work, the amount of claims necessary would be massive. Argument is to go register claims in a centralized registry at IANA, use reverse domain names, etc.. > *Joe Andrieu:* +1 to domain-specific terms, managed by each domain, as they wish. No need to centralize everything into a single registry. That's an anti-pattern we're trying to fix here.. **Manu Sporny:** that approach has been discussed time and time again and that approach just does not scale. > *Orie Steele:* Look at how people are already using the open world capabilities of JSON-LD in industry today... look at knowledge graphs... look at OntoText and Neo4j.. **Manu Sporny:** the ramifications of splitting the data model into two things with different extensibility will split the ecosystem, and is one of the greatest things we could do to damage the ecosystem today. Today, some people are doing it wrong but things like @vocab could be used to help. > *Kristina Yasuda:* @orie: JWT spec defines the claims, but there is a need for a profile like vc-data-model or an ID Token section in oidc to make those claims meaningful - iss/iat/etc are all optional in the JWT spec itself. > *Dave Longley:* i don't understand how a "vanilla JSON 'VC' that doesn't have data model constraints and uses a centralized claims registry" would be different from a JWT -- what would we be doing here?. > *Shawn Butterfield:* If I am forced to include `@context`, but I do nothing to actually use it and none of the relying parties for my use-case rely upon it then what purpose does it serve? *Requiring* it isn't something I can fully support, but I can absolutely see the value in it for some use-cases, so I'm more than happy to optionally use it.. > *Orie Steele:* kristina: ahh yes, we have the "securing specs" to handle those profiles / mappings.. > *Kristina Yasuda:* +1 shawnb... > *Manu Sporny:* shawnb if you don't use use `@context`, what's your extensibility story?. > *Dave Longley:* shawnb, when you read a spec that says what the context is (what the mappings are) and you hard code your software to look for its URL identifier and its mappings, you don't have to programmatically process it.. > *Orie Steele:* Guys... you can sign JSON today... with JOSE... why are you here if you just want to process JSON and JWTs / JWS ?. > *Kristina Yasuda:* Orie, umm securing is how to secure/sign; JWT body of what is signed is separate - why JWT and JWS are separate... > *Manu Sporny:* +1 to Orie. > *Dave Longley:* +1 to Orie. > *Joe Andrieu:* +1 to Orie. > *Jeremie Miller:* +1 shawnb. **Antony Nadalin:** not proposing to get rid of `@context`, it should be optional whether you use it or not. You have troubles today because people find they are not needing it - but you are forcing the parser and logic to understand it. Mandating `@context` has made the world more complex - you don't need it while processing just JSON and JWTs. As far as interoperability is concerned - you hurt interoperability by forcing people to go down this route.. > *Kristina Yasuda:* Orie, it's not how to sign, but the body of what's being signed... > *Orie Steele:* -1 to "hurting interop"... its like saying OIDC hurts interop.... profiling does not hurt interop, in enables it.. > *Michael Prorock:* +1 orie (to his -1). > *Dave Longley:* +1 to Orie. **Kevin Dean:** if we have an envelope model, where we use `@context` as a wrapper for the verifiable credential model where inside the envelope the issuer can do what they please.. > *Shawn Butterfield:* @manu I don't need semantic meaning to have extensibility in the datamodel.. > *Shawn Butterfield:* dlongley - if I do that then what purpose does `@context` serve for my software in processing the payload?. > *Orie Steele:* shawnb, not sure what your use case is, but maybe JOSE / COSE is a better fit for it ?. **David Waite:** One of the issues I have with `@context` environmens where people are not ready to handle it, @vocab are not ignorable, especially within data integrity and cnaonicalization of RDF, as well as without it, you wind up having two different data models for the same piece of data and that matters in a security context.. > *Michael Prorock:* semantic meaning on what a VC itself means is important. > *Shawn Butterfield:* Orie - yes, generally speaking it is.. > *Dave Longley:* shawnb: it's like adding a type definitions file to make JS into TypeScript. > *Orie Steele:* You should use JOSE / COSE... if they are better fit for your use case... You should not try and make everyone use them, if you don't understand their usecases.. > *Dave Longley:* shawnb: the `@context` URL says "these are the types used in here" -- and if your software knows that context, it doesnt' have to do any transforms, it only accepts JSON marked with that `@context` value.. **David Waite:** If you have multiple ways of expressing things and people understand that in different ways, someone might thing an object property means something specific vs. someone processing `@context` thinks theres an extra value there, downloading things dynamically, changes semantics as they are processing it and that's a serious security issue where you can craft messages are meant to be secure but can be interpreted in different ways by different people,. > *Manu Sporny:* where we're not encouraging processing as data, you haven't committed to valid semantic model for extensions where JSON developers are using static things - explosion of complexity. We are not giving people the flexibility to do both sets of tools, we are requiring them to understand security ramifications looking at data in different ways.. > *Shawn Butterfield:* Orie: Agree, I'm not trying to make everyone use them.. > *Orie Steele:* Imagine telling everyone that category theory and type safe languages are bad, because you can use python and javascript.. **Michael Jones:** We already have a split ecosystem, there are two camps, we should support both well than to be halfway inbetween that serves no one.. … responding to manu's comment - the community is already divided. We have those who speak JSON-LD correctly and those who are not. We are better off recognizing that vs leaving things halfway between. > *Orie Steele:* -1 to "there are 2 camps"... there are people who use JOSE / COSE and there are people who use them and the VC Data Model.. ---
dwaite commented 2 years ago

That is, in-line JSON-LD Contexts are strongly discouraged (we might even want to go as far as forbidding them).

-1 to this guidance... it also contradicts conversations with schema.org / google regarding usage of JSON-LD on web pages.

Suggest the working group NOT provide guidance of this form in a W3C TR.

My problem with including inline contexts is that from a predictability perspective, JSON tools would need to evaluate @context to make sure that it is as expected for a particular type of credential - else the same JSON properties could have been redefined to have different semantic meaning and structure for JSON consumers and RDF consumers.

We can define things to be stricter, in that an implementation could compare a list of URI strings for an exact ordered match. To compare against an effective JSON-LD context is not something which I know of a current algorithm for - and for which I doubt there is a simple algorithm to accomplish.

This is also why some (including me) have advocated against such data isomorphism at the proof layer - once I know that I'm evaluating unmodified and integrity-protected data from the issuer it becomes a lot easier for optional consumption of JSON-LD data as RDF or JSON - because you know that the context and data haven't been modified by another party. If there is any manipulation going on for how different verifiers would interpret a credential, you as a verifier know (and can show via non-repudiation) that manipulation was by the issuer themselves.

dlongley commented 2 years ago

@dwaite,

We can define things to be stricter, in that an implementation could compare a list of URI strings for an exact ordered match.

Note that this is all that is needed if the approved contexts use @protected definitions, in order to ensure that both JSON and RDF consumers use the same semantic meaning for known (to the application), protected terms.

David-Chadwick commented 2 years ago

@dwaite "once I know that I'm evaluating unmodified and integrity-protected data from the issuer" Doesn't the VC proof give you this (either LD or JWT proofs). Whether the data stored at the @context URL has been modified or not is not dependent on JSON or JSON-LD processing is it? This requires some integrity protection of the URL, or trust in the web service doesn't it?

TallTed commented 2 years ago

@David-Chadwick — @context and other @ entities need to be wrapped in code fences, i.e., single backticks (`), as in `@context`. Single-quotation-marks (') are not code fences. Please edit your latest https://github.com/w3c/vc-data-model/issues/929#issuecomment-1268841485 appropriately.

dwaite commented 2 years ago

Doesn't the VC proof give you this (either LD or JWT proofs).

JWT, yes, because the model for signature is over binary data.

For something like an URDNA, the signature is over a set of canonicalized quads, which means that it will not capture manipulation of the document to create equivalently canonicalized RDF.

Simple example: canonicalization would hide an intermediary adding or altering the context. This could be done to add an additional mapping such that the JSON representation can contain a predicate under an unexpected name. This would effectively mean JSON tools would act as if the property was not set, while RDF tools would continue to see it.

Second example: adding a null mapping to the context exclude data in the JSON document from being integrity protected or seen in the RDF model, while JSON clients would act on this freely changeable value.

decentralgabe commented 2 years ago

I'd like to echo something I heard @selfissued say today, which I will roughly paraphrase as: "there are already two data ~models~ formats: JSON and JSON-LD. JSON implementations have to add an @context property, which is confusing, because it is not an LD document."

I agree with this logic. I myself have implemented, and encountered multiple implementations that include @context properties out of sheer necessity. It is much more confusing to debug an LD parser error as opposed to knowing a document is using normatively defined fields in the VCDM using a pure JSON representation.

Separating the two formats has already happened, we need to make this abundantly clear to implementers.

OR13 commented 2 years ago

My problem with including inline contexts is that from a predictability perspective, JSON tools would need to evaluate @context to make sure that it is as expected for a particular type of credential - else the same JSON properties could have been redefined to have different semantic meaning and structure for JSON consumers and RDF consumers.

Thats not how JSON-LD processing works... inline contexts just save you from a cache hit... or a cache miss, if you don't know how to cache properly... also this same concern applies to include URIs instead of inline objects.... so its also wrong when you consider this:

We can define things to be stricter, in that an implementation could compare a list of URI strings for an exact ordered match. To compare against an effective JSON-LD context is not something which I know of a current algorithm for - and for which I doubt there is a simple algorithm to accomplish.

You don't need to do any JSON-LD processing... in either inline or remote case, to verify a credential.

This is also why some (including me) have advocated against such data isomorphism at the proof layer - once I know that I'm evaluating unmodified and integrity-protected data from the issuer it becomes a lot easier for optional consumption of JSON-LD data as RDF or JSON - because you know that the context and data haven't been modified by another party.

^ this is an argument against data integrity proofs, not against JSON-LD as a data model.

I can see some of the merits of the argument, its a major reason why I like VC-JWT... when applied to well formed JSON-LD.

I suggest we not mix "VC-JWT" and "Data Integrity Proof" objections into an already overloaded issue.

This thread is about the core data model, which is defined in JSON today, and requires @context.

I am in favor of keeping that requirement, but I am open to eliminating the requirement on the shape of @context to allow full inlining, per the thread and advice from Dan / schema.org.

Without the ability to inline, in page / browser use of verifiable credentials will require context processing (via URIs) to get to RDF... which in my opinion SHOULD NOT be required... You can obtain the exact same n-quads with inline contexts and URI based contexts.

Imagine being able to verify a credential (any proof format) and do graph joins on its terms, without needing to process context URIs?

Screen Shot 2022-10-05 at 5 01 28 PM

etc... All without processing a context IRI... all without doing ANY JSON-LD Processing... Already in a format understood by search engines...

The argument against it is: "sometimes developers make mistakes writing valid HTML / JSON / JSON-LD"... I don't think it's a good argument.

David-Chadwick commented 2 years ago

@dwaite "the signature is over a set of canonicalized quads, which means that it will not capture manipulation of the document " which is one of the reasons why I am against canonicalisation when producing dig sigs, and why like @OR13 I prefer JWS/JWT. Ten years after inventing distinguished encoding rules for X.509 signatures we realised that we never actually needed them.

msporny commented 2 years ago

@decentralgabe wrote:

I'd like to echo something I heard @selfissued say today, which I will roughly paraphrase as: "there are already two data models: JSON and JSON-LD.

Per the JSON spec, this is objectively wrong :) -- JSON is a data format (a serialization), it is not a data model (as evidenced multiple times in RFC 8259). Infra defines a data model, RDF defines a data model, and JSON-LD defines a data model... JSON does not. While the point might come off as pedantic, it matters because people keep conflating "data format" with "data model". It especially matters here because we're defining global standards and (ideally) the work we're doing here needs to be precise. So, please, be precise (otherwise we're going to talk in circles).

@decentralgabe wrote:

I myself have implemented, and encountered multiple implementations that include @context properties out of sheer necessity. It is much more confusing to debug an LD parser error as opposed to knowing a document is using normatively defined fields in the VCDM using a pure JSON representation.

I've seen people misuse credentialSubject and issuer, per the logic above, we should remove those features as well, right?

As @OR13 said above, what's really going on is that some developers are making mistakes with @context and some are suggesting that we remove the property as a result and migrate to a centralized registry to support extensibility and interoperability. That proposal is deeply flawed because it is not scalable, favors centralization, and harms interoperability. There is a more reasonable path forward, which is to make it easier for developers to not make those mistakes with @context by at least 1) providing better tooling and test suites, 2) providing mechanisms that provide a smoother onramp (like appropriate use of @vocab), and 3) enforcing verification failures in production when contexts are malformed.

msporny commented 2 years ago

@dwaite wrote:

Simple example: canonicalization would hide an intermediary adding or altering the context. This could be done to add an additional mapping such that the JSON representation can contain a predicate under an unexpected name. This would effectively mean JSON tools would act as if the property was not set, while RDF tools would continue to see it.

Your attack is not precise enough to evaluate, please add more detail.

Second example: adding a null mapping to the context exclude data in the JSON document from being integrity protected or seen in the RDF model, while JSON clients would act on this freely changeable value.

Same as the above... your attack is not precise enough to evaluate, please add more detail.

decentralgabe commented 2 years ago

@msporny my mistake. I have been referring to JSON and JSON-LD as data formats (models was a slip) - since as you noted JSON is a data format, which is subsetted by JSON-LD.

I've seen people misuse credentialSubject and issuer, per the logic above, we should remove those features as well, right?

I know you're trying to make a point here, but this is not an apt comparison. The VCDM normatively defines the terms credentialSubject and issuer—and yes, @context too, but there's a lot more that comes with it! Equating the weight of LD processing to using an issuer property is misleading at best.

As @OR13 said above, what's really going on is that some developers are making mistakes with @context and some are suggesting that we remove the property as a result and migrate to a centralized registry to support extensibility and interoperability. That proposal is deeply flawed because it is not scalable, favors centralization, and harms interoperability.

Emphasis mine. I am not certain about this assumption. I can see solutions that result in lack of scale, centralization, and harming interoperability....but I can also see the same for LD. Regardless, your point is a good one: if we are to go this route we need to come up with reasonably scalable, decentralized, and interoperable methods to support JSON as a standalone format. I don't think that is impossible.

msporny commented 2 years ago

@msporny my mistake. I have been referring to JSON and JSON-LD as data formats (models was a slip) - since as you noted JSON is a data format, which is subsetted by JSON-LD.

To be clear, JSON-LD subsets JSON and adds a data model to it (based on patterns we saw people using with JSON to add hyperlinked information). That is, JSON-LD defines both a data model AND a data format. JSON only defines a data format. So, if the JSON-only folks don't re-use the JSON-LD data model... how do you link data? how do you merge data? What is the extensibility story? What is the internationalization story? ... and so on. You lose all of those things, so you have to replace them with something... what is that something?

The VCDM normatively defines the terms credentialSubject and issuer—and yes, @context too, but there's a lot more that comes with it! Equating the weight of LD processing to using an issuer property is misleading at best.

My point is that those two properties provide foundational behavior for the specification. Removing them, without having a workable replacement proposal, has catastrophic effects on the ecosystem. It is being suggested that we remove @context and the replacement proposal is effectively: Just use an IANA registry, centralization is ok for VCs, and the fact that some people will use the IANA registry and others will use @context and that won't cause serious interoperability issues... well, all of that is deeply concerning.

We've discussed all of this in the VCWG before (over many years, mostly during the v1.0 work), and it seems as if a new crop of WG members are reviving old ideas that were previously identified as unworkable in or harmful to the ecosystem without deeply considering the warnings that many in the group are providing.

Regardless, your point is a good one: if we are to go this route we need to come up with reasonably scalable, decentralized, and interoperable methods to support JSON as a standalone format. I don't think that is impossible.

Yes, and thank you -- the sooner we can get to concrete WORKABLE proposals on how to support JSON as a standalone format, the better. The last time we tackled this problem around the 2008-2012 timeframe, we ended up inventing JSON-LD. :)

dwaite commented 2 years ago

Your attack is not precise enough to evaluate, please add more detail.

Same as the above... your attack is not precise enough to evaluate, please add more detail.

Sure, here's an example of a substitution where JSON and RDF views of the data deviate - I'd love to hear feedback, as this has been a long-standing concern about canonicalization.

I grabbed this example from traceability-vocab, and cobbled together the additional context to the best of my ability using the JSON-LD playground. The n-quad view of the (corrected) example text and this document appeared to be the same, while a JSON API working on object properties would think the issuer is asserting the organization's address and position as Santa's Workshop.

{
    "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://w3id.org/traceability/v1",
        {
            "Organization": {
                "@id": "https://schema.org/Organization",
                "@context": {
                    "location": null,
                    "rdfLocation": {
                        "@id": "https://schema.org/location"
                    }
                }
            }
        }
    ],
    "id": "urn:uuid:3978344f-8596-4c3a-a978-8fcaba3903c5",
    "type": [
        "VerifiablePresentation",
        "TraceablePresentation"
    ],
    "workflow": {
        "definition": [
            "urn:uuid:n1552885-cc91-4bb3-91f1-5466a0be084e"
        ],
        "instance": [
            "urn:uuid:f5fb6ce4-b0b1-41b8-89b0-331ni58b7ee0"
        ]
    },
    "holder": {
        "id": "did:web:sender.example",
        "type": "Organization",
        "rdfLocation": {
            "type": "Place",
            "geo": {
                "type": "GeoCoordinates",
                "latitude": "68.7083",
                "longitude": "4.6377"
            },
            "address": {
                "type": "PostalAddress",
                "organizationName": "Ratke - Bergstrom",
                "streetAddress": "21851 Ima Heights",
                "addressLocality": "O'Connellborough",
                "addressRegion": "Missouri",
                "postalCode": "65587",
                "addressCountry": "Cyprus"
            }
        },
        "location": {
            "geo": {
                "type": "GeoCoordinates",
                "latitude": "90",
                "longitude": "135"
            },
            "address": {
                "type": "PostalAddress",
                "organizationName": "Santa's Workshop",
                "streetAddress": "123 Elf Road ",
                "addressRegion": "North Pole",
                "postalCode": "88888"
            }
        }
    }
}
iherman commented 2 years ago

I grabbed this example from traceability-vocab, and cobbled together the additional context to the best of my ability using the JSON-LD playground. The n-quad view of the (corrected) example text and this document appeared to be the same, while a JSON API working on object properties would think the issuer is asserting the organization's address and position as Santa's Workshop.

@dlehn, @gkellogg, or @pchampin are the real experts here, but, in my opinion, the value of the location must be set to "Ratke - Bergstrom"'s address and Santa's workshop is effectively ignored from the output. Indeed, the effect of the null value for the "location" term within the scoped context is that the term, and its value, is ignored from the RDF output (and, as you say, this is what the JSON-LD playground does)

In other words, there is a bug in the JSON API implementation you use ☹️

TallTed commented 2 years ago

@msporny — Another unfenced @context in the second quote block in your https://github.com/w3c/vc-data-model/issues/929#issuecomment-1269080803 ...

OR13 commented 2 years ago
  "@context": [
        "https://www.w3.org/2018/credentials/v1",
        "https://w3id.org/traceability/v1",
        {
            "Organization": {
                "@id": "https://schema.org/Organization",
                "@context": {
                    "location": null,
                    "rdfLocation": {
                        "@id": "https://schema.org/location"
                    }
                }
            }
        }
    ],

Imagine trusting an issuer who was actively attempting to serve you maliciously crafted JSON... and still blaming JSON instead of the issuer...

See also prototype pollution... if the issuer wants to be nasty... you are wrecked... thats why you need to verify content before you parse it.

dlongley commented 2 years ago

@dwaite,

As I mentioned above, just add "@protected": true to your contexts and you don't have to worry about this anymore. If the traceability context has this added to its term definitions then the redefinition is not permitted, preventing the problem -- and allowing anyone to just look for the traceability URL in the @context array without having to worry about what comes after. This is what I was describing above as enabling a simple way for applications to "process" well-known @context values without needing a JSON-LD processor.

An approximation of this on the playground can be found here, where I redefined the Organization property inline with the @protected field, prior to where you add the bad definition from your example, for the purpose of this discussion: https://tinyurl.com/492h7umf

You'll note that anyone running a JSON-LD processor will now see a redefinition error as desired.

msporny commented 2 years ago

@dwaite wrote:

here's an example of a substitution where JSON and RDF views of the data deviate - I'd love to hear feedback, as this has been a long-standing concern about canonicalization.

Thank you for the more complete example, that helped immensely! :)

I agree with much of what @iherman, @OR13, and @dlongley said above. It might also be worth mentioning the following things:

  1. It is a best practice to not load JSON-LD Contexts used in VCs from the Web (for production systems). We really should say this in a much stronger way in the VC spec, and tried to in previous versions, but some in the WG pushed back on the grounds that they wanted to see more implementation experience before providing that sort of guidance.
  2. Some VC/JSON-LD processors can be put into a safe mode where they will refuse to digitally sign (or verify) anything that drops a term or MiTM attacks where the issuer and verifier infrastructure is compromised to load different JSON-LD Context files. This protects against issuers not signing a subset of the information they thought they were signing.
  3. The use of @protected prevents the sort of "bad issuer"/"MiTM" redefinition attack you allude to above.
  4. It has been suggested that we use something like hashlinks for JSON-LD Contexts so that you can load them from the web, but know for sure that you're dealing w/ the same context that was used to canonicalize. That would require broad deployment of hashlinks, which isn't expected to happen in the next year or two (at least).
  5. If you don't use @context (and instead use JSON-only with an IANA registry) you will have zero protections in VC-JWT against the sorts of attacks you're outlining. The IANA registry can be updated at any point to add/modify/remove terms and/or their status, and there will be ill-defined semantics for issuer-provided terms that do not exist in the IANA registry (that might be processed by the verifier, because they think the term means something else).

Fundamentally, the "easy" protections against the sort of attacks you're highlighting are 1) use @protected (this lets you just check the JSON-LD Context URLs when processing as JSON), and 2) never load remote JSON-LD Contexts in production (to prevent DoS/MitM-style attacks). We should certainly do a better job in the VC spec (or VC Implementers Guide) to highlight these potential attacks (and their mitigations).

Did we miss something in our analysis, @dwaite? Some nuance that still allows the attack you describe?

pchampin commented 2 years ago

In addition to @msporny's recommendation above, maybe in the context of VCs (or other security a critical applications), contexts that are not @protected should be refused (or at least issue a warning)?

dwaite commented 2 years ago

Imagine trusting an issuer who was actively attempting to serve you maliciously crafted JSON... and still blaming JSON instead of the issuer...

For RDF based canonicalization schemes, the modifications made would not need to be made by the issuer, but could be made by any holder as part of creating a credential.

jandrieu commented 2 years ago

@dwaite I think you're misunderstanding what the roles are in these interactions. Holders cannot amend a credential, the proofs ensure that any such modifications would be evident. Issuers create credentials, not holders.

While it is possible for someone who is a holder to issue a VC, that is a case of a particular party acting in different roles with regard to different VCs.

dwaite commented 2 years ago
  1. It is a best practice to not load JSON-LD Contexts used in VCs from the Web (for production systems). We really should say this in a much stronger way in the VC spec, and tried to in previous versions, but some in the WG pushed back on the grounds that they wanted to see more implementation experience before providing that sort of guidance.

I can understand that, push-back as such fixed context limits the extensibility offered to issuers. If credential of a given type have a fixed set of acceptable context locations, those locations are effectively the same registries being argued against for JSON-style extensibility.

However, changing context values could cause future signatures to break due to compatibility issues (say, changing a context in places from using @vocab to using @json). One could also use such changes to break the ability to verify signatures on purpose. Thats ignoring the denial of service and other interesting attacks possible given that @context values need to be resolved in an unauthenticated context (before the messages are proven to have been given unmodified by the issuer/holder).

(removing some good suggestions)

  1. If you don't use @context (and instead use JSON-only with an IANA registry) you will have zero protections in VC-JWT against the sorts of attacks you're outlining. The IANA registry can be updated at any point to add/modify/remove terms and/or their status, and there will be ill-defined semantics for issuer-provided terms that do not exist in the IANA registry (that might be processed by the verifier, because they think the term means something else).

Such a registry is used for preventing conflict and providing documentation of purpose. It is not used by tooling to apply semantic transforms.

A business could change the definition to say a claim didn't mean what you understood it to mean. But they can do that with linked data formats as well. The interpretation of the data is not intrinsic in the graph but in the definitions of what the predicates mean.

For JOSE technologies and JWTs, there is a concept of using a non-colliding name (such as a URI) for a JSON property if you do not want to go through registration. JWT claims can also define their own formats, including their own scheme for managing such conflicts on the properties of their own data. Most notably, the "vc" and "vp" claims today delegate to this group.

Fundamentally, the "easy" protections against the sort of attacks you're highlighting are 1) use @protected (this lets you just check the JSON-LD Context URLs when processing as JSON), and 2) never load remote JSON-LD Contexts in production (to prevent DoS/MitM-style attacks). We should certainly do a better job in the VC spec (or VC Implementers Guide) to highlight these potential attacks (and their mitigations).

Did we miss something in our analysis, @dwaite? Some nuance that still allows the attack you describe?

Typically the way to avoid these sorts of attacks is to perform all future work on the output of the canonicalization step, since that is the actual integrity-protected data. In the case of something like a JWT, that is relatively simple - because the signed body is a concatenation of two base64-encoded octet streams, you can return a local manipulation of that into the protected header and payload.

For RDF canonicalization, you would either be working with the quads in a RDF model, or be translating back to JSON-LD, now via locally-supplied @context values. Note that leaving as a non-mapped RDF form means that altered or broken context files will not affect ones ability to re-validate in the future, including for non-repudiation purposes.

I'll also add that you typically want to perform any additional verification steps before you raise back up to the application layer

This is both because the application layer may not understand the separation of responsibilities, and because the application layer has a tendency to commingle data of mixed provenance, with the idea that they are now operating in the realm of a single source of truth. Due to extensibility, some of these are more architectural and documentation concerns for implementations rather than concerns of an implementation of the data model or of data integrity.

dwaite commented 2 years ago

@dwaite I think you're misunderstanding what the roles are in these interactions. Holders cannot amend a credential, the proofs ensure that any such modifications would be evident. Issuers create credentials, not holders.

Apologies because I wasn't able to give a credential with a valid URDNA proof. This was not a malicious issuer, but a malicious holder.

The changes I made to the document by adding the additional context would not break the resulting data integrity, because it would not affect the RDF interpretation of the data.

That means that any intermediary could alter the JSON and keep the signature valid, by using contexts to map how the underlying data is represented. In my example, I moved the signed data out of the way and added a new, non-integrity protected data in its place in the JSON document.

Tools operating on the RDF would not notice this change happened. Tools operating against JSON are now operating on non-integrity-protected data supplied by the intermediary.

dlongley commented 2 years ago

@dwaite,

The changes I made to the document by adding the additional context would not break the resulting data integrity, because it would not affect the RDF interpretation of the data.

That's not true, the number of RDF quads isn't even the same.

Before your inline context is added: https://tinyurl.com/2p95nn3j After: https://tinyurl.com/3bssrz7d

Note that you'll even have to disable safe mode via the options to run the After case.

dwaite commented 2 years ago

That's not true, the number of RDF quads isn't even the same.

Before your inline context is added: https://tinyurl.com/2p95nn3j After: https://tinyurl.com/3bssrz7d

The before link looks like you are not operating on the starting sample from the traceability-vocab document. Note, I did have to remove a comment and an errant comma to get the playground to accept it as valid JSON input.

Note that you'll even have to disable safe mode via the options to run the After case.

That was not the behavior I had when I wrote the sample up. Does anyone know if the tooling changed?