w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
289 stars 106 forks source link

JSON-only processors and VCDM 2.0 conformance #1290

Closed awoie closed 9 months ago

awoie commented 12 months ago

I just want to make sure we cover the following case for JSON-only processors.

According to section 4.4:

The value of the type property MUST be, or map to (through interpretation of the @context property), one or more URLs.

According to section 1.4, a conforming processor is defined as follows:

A conforming processor is any algorithm realized as software and/or hardware that generates or consumes a conforming document. Conforming processors MUST produce errors when non-conforming documents are consumed.

We should describe what a JSON-only processor has to do in that case since they cannot do JSON-LD expansion and would not be able to recognize compacted type values as URLs.

That brings up two questions:

awoie commented 12 months ago

One proposal would be to drop the URL requirement entirely since for JSON-LD processors type will always expand to a (issuer-dependent) URL due to @vocab.

selfissued commented 12 months ago

@awoie raises a good question.

OR13 commented 12 months ago
{
  "@context": {
      "@vocab": "https://vendor.example#"
  },
  "@id": "urn:uuid:123",
  "@type": ["urn:uuid:456", "Foo"]
}
<urn:uuid:123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://vendor.example#Foo> .
<urn:uuid:123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <urn:uuid:456> .

^ this is the behavior of the current v2 context... which uses id and type as aliases for @id and @type.

OR13 commented 12 months ago

To resolve this issue I suggest some sentences be added to the "value of JSON-LD" section, explaining that

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://vendor.example#Foo> and <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <urn:uuid:456>

are more valuable to the 3 role models than: "type": ["urn:uuid:456", "Foo"]

Its easy to argue against the value of RDF... its hard to argue for it... unless you actually show RDF that is substantially better than what you get from JSON by itself.

awoie commented 12 months ago

To resolve this issue I suggest some sentences be added to the "value of JSON-LD" section, explaining that

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://vendor.example#Foo> and <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <urn:uuid:456>

are more valuable to the 3 role models than: "type": ["urn:uuid:456", "Foo"]

Its easy to argue against the value of RDF... its hard to argue for it... unless you actually show RDF that is substantially better than what you get from JSON by itself.

Yep, and for that reason, a JSON-only processor is not able to understand that "Foo" is actually a URL.

I think what we need to do is either of the following:

  1. say that VC "type" should be a URL and not must be a URL. But this does not solve the problem since there might be other RDF types somewhere else in the document.
  2. update the JSON processing section or drop it entirely since it doesn't help understanding how a JSON-only processor can use that section to not fail when they encounter a VC "type" that is not a URL without JSON-LD processing.
  3. requiring VC "type" value to be a expanded URL, so that JSON-only processors won't fail. But his probably breaks the requirement that the data model must use JSON-LD compact form.

So, I guess 3. is the best option. Not sure what needs to be added to the JSON processing section though. Any thoughts? Perhaps, I'm also just missing something. (cc @seabass-labrax)

OR13 commented 12 months ago

I'm in favor of dropping the JSON processing section, and instead adjusting the JSON-LD section to address processing the core data model as RDF and processing it as JSON without converting to RDF (subtle difference from what we have today, but makes better technical recommendations imo)

I think its a mistake to tell people that the core data model can be treated as JSON, when its based on JSON-LD... we have seen this causing lots of problems with DIDs still.

Its also a mistake to ignore what it costs to convert strings to URLS, it costs power, time and energy and sometimes it fails.

All of that is important when evaluating the value of processing claims as URLs.

decentralgabe commented 12 months ago

The problems you raise are correct, but for JSON processors, does it matter? If they do not care about semantics then what good is it to misunderstand semantic processing? I was under the assumption that as long as terms didn't conflict, JSON processors would be fine.

awoie commented 12 months ago

The problems you raise are correct, but for JSON processors, does it matter? If they do not care about semantics then what good is it to misunderstand semantic processing? I was under the assumption that as long as terms didn't conflict, JSON processors would be fine.

IMO, if there is a JSON processor that implements the VCDM 2.0 spec and parses type and expects an URL, then that processor would fail. To me, it is still not clear whether it is better for them to expect the expanded URL for the type value or the compacted form.

The problem gets worse if JSON and JSON-LD processors are mixed.

IMO, we should add some guidance in the spec.

dlongley commented 12 months ago

I think there continues to be confusion around "JSON processing" -- and this is really "JSON-LD processing", it's just that newcomers do not always understand that you don't need a JSON-LD library to consume a JSON-LD document. You can read it like you would any other JSON library, following the rules of the spec associated with the document (in the case here, that is the VCDM and JSON-LD specs -- and any additional spec / context from a particular community). I think we should rename the section to "JSON-LD processing" and have it clarify that JSON-LD libraries are not needed to do "JSON-LD processing" when you follow the advice in that section. In that case you are simply using "JSON" -- but in the same way anyone consumes JSON -- following some spec(s) to interpret it appropriately.

I think we should say that if a value has a compact form defined in the context, then that's the form that should be used. These "JSON processors" (as we reference them today) already must know the context that is being operated on (as mentioned in the currently named "JSON processing" section) and can therefore rely on checking for non-URL values for anything that would be in the context (including @vocab expanded things). This does not preclude someone putting a full URL somewhere that would not be shortened by the context and that would not use the @vocab prefix.

decentralgabe commented 12 months ago

IMO, if there is a JSON processor that implements the VCDM 2.0 spec and parses type and expects an URL, then that processor would fail. To me, it is still not clear whether it is better for them to expect the expanded URL for the type value or the compacted form.

Can you provide an example where the difference matters for a JSON processor?

The problem gets worse if JSON and JSON-LD processors are mixed.

You might as well treat them as two different credentials...I agree.

brentzundel commented 11 months ago

I like @dlongley 's suggestion, but it seems something may still be missing (or I am still missing something).

The VCDM says the value of type must be a URL, or map to a URL through the context. If I am processing as JSON, I don't plan to map anything via the context, so I could assume that a conformant processor should look for a URL as the value of type and fail when it encounters a string. Since this is not the desired behavior, changing the text to clarify seems necessary.

dlongley commented 11 months ago

@brentzundel,

If I am processing as JSON, I don't plan to map anything via the context...

I would say that there are two cases there.

In one case, you are reusing a context where someone else has already done the mapping. You read that someone's spec, it said use context X and use these types String1, String2, etc. Then you follow the advice in the VCDM spec to ensure the document you're producing or consuming includes context X and then you just use those strings for types, expecting everyone else to do the same as the spec says (or should say) they should. You reject or ignore any other values (depending on your application needs). If you're doing some custom extension on top of all that, you use a URL because it wouldn't be defined in the docs or the context.

In another case, you're just using the base VCDM context (v2). Here you just use URLs if you want your types to be unique across issuers, or you use issuer-dependent strings (which @vocab will map to the issuer-dependent vocab) for a more closed-world approach.

OR13 commented 11 months ago

Regardless of what we say, this will continue to be valid:

  1. JSON
  2. JSON-LD
  3. RDF
{
  "@context": {
    "givenName": "http://schema.org/givenName",
    "familyName": "http://schema.org/familyName"
  },
  "@id": "http://me.markus-lanthaler.com/",
  "@type": "http://schema.org/Person",
  "givenName": "Markus",
  "familyName": "Lanthaler"
}

Anything that implies off the shelf behavior is "wrong", needs to be removed from the spec first... then we can profile from the way things are... to what restrictions we can get the WG to agree to impose.

I don't see the WG agreeing to restrict type, beyond what is legal per: https://www.w3.org/TR/json-ld11/#specifying-the-type

awoie commented 11 months ago

Regardless of what we say, this will continue to be valid:

  1. JSON
  2. JSON-LD
  3. RDF
{
  "@context": {
    "givenName": "http://schema.org/givenName",
    "familyName": "http://schema.org/familyName"
  },
  "@id": "http://me.markus-lanthaler.com/",
  "@type": "http://schema.org/Person",
  "givenName": "Markus",
  "familyName": "Lanthaler"
}

Anything that implies off the shelf behavior is "wrong", needs to be removed from the spec first... then we can profile from the way things are... to what restrictions we can get the WG to agree to impose.

I don't see the WG agreeing to restrict type, beyond what is legal per: https://www.w3.org/TR/json-ld11/#specifying-the-type

I agree and that is why I think the entire JSON processing section is completely misleading although some things might be true statements but this doesn't justify their presence if it confuses people. If WG members are debating about those things then implementers will be definitely confused too.

awoie commented 11 months ago

https://github.com/w3c/vc-data-model/pull/1298 might close this issue.

brentzundel commented 11 months ago

1298 might close this issue.

how about #1302 ?

msporny commented 11 months ago

Since it looks like both #1298 and #1302 might fail to be merged given our new more aggressive (and justified) PR work mode... and in yet another attempt to try to use better language around what we're trying to convey. What if we were to (as suggested in the threads above by @awoie, @OR13, and @dlongley). We do at least the following thing:

It's really (bikeshed) "application-specific processing". That is, you use an application-specific JSON Schema-like mechanism to check the structure of the VC you're expecting, and if it passes your JSON Schema, you process it like you would process any application-specific JSON object -- using a set of static rules in your code. Another name we could use here is "static processing".

Then the "other thing" we do is "JSON-LD Processing", which is utilizing a JSON-LD library to process the VC, which might include compact/expand... or converting to RDF.

We might also consider an "RDF Processing" section, which is done when doing things like securing via some forms of Data Integrity.

There are variations of the above in some of the PRs that @OR13 has raised. The alternate path being proposed here is we take care of the "JSON Processing" language first and see where that takes us. I can try out a PR or three to see if we can make headway there. To be clear, this doesn't change the way some people have written VC software for years... it just changes the way we explain it so that we get more alignment among mental models.

decentralgabe commented 11 months ago

+1 to application-specific processing. Worth calling out what that means, and the tradeoffs for interop.

awoie commented 11 months ago

Since it looks like both #1298 and #1302 might fail to be merged given our new more aggressive (and justified) PR work mode... and in yet another attempt to try to use better language around what we're trying to convey. What if we were to (as suggested in the threads above by @awoie, @OR13, and @dlongley). We do at least the following thing:

  • Stop calling it "JSON Processing"

Yes and we should also stop saying that JSON processors can do VCDM 2.0 processing without applying non-standard extra steps, or by only applying steps defined by the VCDM 2.0 standard.

It's really (bikeshed) "application-specific processing". That is, you use an application-specific JSON Schema-like mechanism to check the structure of the VC you're expecting, and if it passes your JSON Schema, you process it like you would process any application-specific JSON object -- using a set of static rules in your code. Another name we could use here is "static processing".

I'm not in favour of this phrasing. "application-specific processing" could literally mean anything but most importantly it means it is not defined by the VCDM 2.0 standard and therefore we cannot make any assumptions on what implementers will do which does not lead to interoperability. I'm in favour of removing the entire section on JSON-processing because I disagree that the above phrasing is helpful. It still implies that JSON-only processors can comply to the standard if they do non-standardized (application-specific) things to comply. This does not sound right to me.

iherman commented 10 months ago

The issue was discussed in a meeting on 2023-11-01

View the transcript #### 2.5. JSON-only processors and VCDM 2.0 conformance (issue vc-data-model#1290) _See github issue [vc-data-model#1290](https://github.com/w3c/vc-data-model/issues/1290)._ _See github pull request [vc-data-model#1302](https://github.com/w3c/vc-data-model/pull/1302)._ **Brent Zundel:** JSON only processor and VCDM 2.0 conformance. Raised by Oliver and not assigned to anyone. He brings up good points about how we define it in the spec. If no one gets assigned it can't move forward. > *Dave Longley:* "credential specific" is another to use if application-specific is too broad. **Manu Sporny:** I'm wondering if we can get direction from the group. Its clear that people don't like the term "json processing". There have been two alternative terms proposed. We stop calling it JSON processing because certain workgroup members feel its misleading. Do we want to call it status versus dynamic or application specific versus generalized. Can we get feedback? > *Dave Longley:* or "context specific". **Ivan Herman:** application specific is better than dynamic versus static. I think that application specific is really very broad. In our case what is relevant is that its credential specific. Its what we should do. **Brent Zundel:** we do have a somewhat related PR open 1302 which is marked post-CR. Maybe it doesn't apply? Does it apply to this issue? How? As far as context specific, I agree that application specific doesn't tell me much. Credential specific might be better. But that begs the question that if there is credential specific processing, why haven't we defined it. I would expect that question to be asked. > *Ivan Herman:* +1 to be relevant to 1302. **Joe Andrieu:** I think limited might be what we are talking about. We should do ranked poll and pick. **Manu Sporny:** We could debate endlessly but a ranked choice poll would be a good option. I can put this out this week and run it for a week and we can review what comes back. > *Dave Longley:* some choices i heard: credential-specific, context-specific, application-specific, static vs. dynamic, limited, restricted. **Manu Sporny:** Need to get all the options down. Please put your options into the minutes so it shows up in the poll. … The other question was "is 1302 relevant". It is relevant but 1302 does other change beyond what we call these two things. By naming these two things the rest becomes easier to talk about. **Joe Andrieu:** I think what we are trying to name is not application specific. Its the choice of the verifier what processing they want to do. I think we put the choice at the verifier. Its not a choice of the issuer, application or credential. Thats why they didn't resonate with me. **Dave Longley:** Agree with Joe. We can fall into a lot of pitfalls if we specify the type of processing. Its really about the specific set of document that you accept. We don't want to confuse people into thinking that data would be understood in a different way. Its whether you accept a lot of things or just what you understand in your context. **Brent Zundel:** We have another PR and poll going out. But no one is assigned to the issue. **Manu Sporny:** I could pick it up once the poll is done and its clear what people want it to be changed to. **Ivan Herman:** can we do it now. **Manu Sporny:** Not sure over IRC. We have a ranked choice poll tool and I suggest we use it. **Brent Zundel:** Look forward to seeing the poll. Moving on to the next issue.
msporny commented 10 months ago

@decentralgabe wrote:

+1 to application-specific processing. Worth calling out what that means, and the tradeoffs for interop.

@awoie wrote:

I'm not in favour of this phrasing. "application-specific processing" could literally mean anything

A poll has been created to gather feedback from the WG on what the appropriate term is for the behavior that we're covering in the specification:

https://www.opavote.com/en/vote/5254957337935872

Everyone that has strong feelings about the correct terminology should feel free to weigh in on the poll above.

iherman commented 10 months ago

The issue was discussed in a meeting on 2023-11-15

View the transcript ### 1. poll results. **Manu Sporny:** We should review the results from the poll. … Perhaps people could emote here to add late votes? > *Joe Andrieu:* what's the URL for the poll? > *Dave Longley:* [https://www.opavote.com/en/vote/5254957337935872](https://www.opavote.com/en/vote/5254957337935872). **Manu Sporny:** The context is that we decided a few weeks ago to run a poll. We wanted to change the name of a certain aspect. It looks like we'll choose to 'Credential Type-specific' processing. > *Orie Steele:* Of these options, "Credential Type" is probably the best, but it omits the fact that the context can change information regardless of credential type. **Manu Sporny:** I have already created a PR. I suggest that we close the poll and I can modify the PR to show the chosen result. > *Dave Longley:* Orie: we should mention immutable or "semantically immutable" contexts better in the section. _See github pull request [vc-data-model#1351](https://github.com/w3c/vc-data-model/pull/1351)._ _See github issue [vc-data-model#1290](https://github.com/w3c/vc-data-model/issues/1290)._ > *Orie Steele:* There was also a thread in the W3C CCG on this topic recently... [https://lists.w3.org/Archives/Public/public-credentials/2023Nov/0030.html](https://lists.w3.org/Archives/Public/public-credentials/2023Nov/0030.html). **Joe Andrieu:** I appreciate the term 'limited' but I don't like the term 'unlimited'. I think that could be a problem. > *Joe Andrieu:* +1 to get us unstuck. **Manu Sporny:** The poll is not a binding vote, so we can still take into account other views. > *Joe Andrieu:* I'll add here (so the meeting can move on) that I also don't think the distinction is valid. It isn't a choice between applications that can work with any credential or those with specific credentials. =(. **Brent Zundel:** It is OK to have a compromise, but I would like to avoid spending further time on the topic during this meeting.
jandrieu commented 10 months ago

I don't think the distinction as stated in the poll is the right one:

You can write applications that only work with a specific set of credentials (e.g., issuers, verifiers), or you can write applications that can work with any credential (e.g., digital wallets).

Rather, I think the decision is

Making that even worse is a third possibility

But my first surprise on the poll was the pairing of names. That's going to lead to weird and likely invalid results. We should probably separate the two and do them separately.

Another way to think about these three

In particular, I want to note that Option C is the biggest lift as an ask for anyone in the ecosystem, and we definitely have consensus against requiring that level of complexity for verifiers. Option A is the lightest lift, which is what we might have called json-only processing. Option B is what I understood as json-ld-light.

Given these three different notions of processing, I don't think the bound pairs in the poll are going to give us much useful information.

msporny commented 10 months ago

PR #1351 has been merged, which has resulted in the removal of the term "JSON-only processing" and has been replaced with the "Credential Type-specific Processing" vs. "General JSON-LD Processing".

After verification is complete, the former type of processing can be done w/ a simple JSON Schema check (or similar mechanism) and then processing the resulting object directly (no JSON-LD library required, no conversion to RDF required). The latter requires the use of a JSON-LD library.

Based on the language in the spec today, the answers to the questions asked in the initial issue are:

Does that mean that a Credential Type-specific processor requires to use fully qualified type values when issuing VCs to be conformant?

No, it does not. For types, short form is strongly encouraged everywhere. Language has been added to the specification that states that interoperability will be harmed if URLs are used for type values.

Does that mean that issuers that produce VCs with a type using compacted form cannot be verified by Credential Type-specific processor processors?

No, it does not. Issuers that produce VCs with a type using compacted form are expected to have broad interoperability using either Credential Type-specific implementations or General JSON-LD Implementations.

I'm marking this issue as pending close to see if there is disagreement that we have addressed the issue via PR #1351.

awoie commented 10 months ago

I still believe that the current language may give rise to conflicts regarding the @context for credential type-specific implementations. There is no assurance that a credential type-specific implementation can verify whether the provided credential aligns with their expectations without expanding the type and all associated properties themselves. Even if the JSON schema is enforced by the credential type-specific application, inaccuracies may arise, as the JSON schema does not account for the expanded form and IRIs of the terms included in a credential.

JSON-LD expansion serves the purpose of semantic disambiguation by expanding all terms and translating them to IRIs.

A calling application may undertake this challenging task and may comprehend that a credential is suitable for a credential type-specific implementation, but the credential type-specific application may not share the same understanding.

In my opinion, something should be added to emphasize that the credential type-specific application MUST depend on a calling application to perform this task genuinely, essentially adhering to W3C VCDM processing (as specified in the standard). Otherwise, the current language does not convey that credential type-specific processing is W3C VCDM compliant.

There is no normative language that proves otherwise:

These consumers can use credential-type-specific processing instead of generalized processing.

brentzundel commented 10 months ago

@awoie What we are hoping to get from you is a suggestion for concrete text that would address your concern if added to the spec.

iherman commented 10 months ago

The issue was discussed in a meeting on 2023-11-28

View the transcript #### 1.5. JSON-only processors and VCDM 2.0 conformance (issue vc-data-model#1290) _See github issue [vc-data-model#1290](https://github.com/w3c/vc-data-model/issues/1290)._ **Brent Zundel:** I believe this has been overtaken by events. … But we received a comment on it by Oliver yesterday. **Manu Sporny:** Oliver may be requesting a minor modification to the text. … We may want to ask Oliver to propose text that we can remove. … We don't use "JSON-only processing" anymore. > *Orie Steele:* +1 to JSON-LD processing and RDF processing.... there is no JSON processing. **Brent Zundel:** I can tag Oliver. … We will ask for a concrete suggestion from Oliver. … I'm going to leave pending close on there for now. … We will meet tomorrow to dive more deeply into 1338 - the verification algorithm. … We are in good standing to enter CR by mid-December, which is our goal. ---
msporny commented 9 months ago

@awoie ping (again), do you have a concrete text proposal that the WG could consider? In an attempt to be specific:

@awoie wrote:

There is no assurance that a credential type-specific implementation can verify whether the provided credential aligns with their expectations without expanding the type and all associated properties themselves.

I don't believe the statement above, based on the re-written text in the spec, is true. Specifically, the specification contains this text now for credential type-specific processors:

image

That, along with the other rules in that section, provides the assurance that the type expansion that you refer to is not necessary.

I believe we've made a number of changes to the specification in order to address your concerns. A concrete text proposal is what we need at this point to proceed, if not, the WG will most likely close this issue as we believe that we have addressed the concerns you have raised in this issue.

brentzundel commented 9 months ago

We have attempted to address this issue with PR #1351 Concerns were raised that the PR was not sufficient, but more than two weeks have passed without concrete suggestions for further changes. Closing.