w3c / vc-data-model

W3C Verifiable Credentials Working Group — VC Data Model and Representations specification
https://w3c.github.io/vc-data-model/
Other
282 stars 98 forks source link

Re-evaluate support for `@vocab` in base VCDM v2 context #1514

Open msporny opened 4 days ago

msporny commented 4 days ago

Previously, the VCWG decided to define a @vocab value in the base context (see https://github.com/w3c/vc-data-model/issues/953). Recently, a security disclosure (which is still under debate) has resulted in a number of individuals that had previously been in support of defining an @vocab in pulling their support for the feature since it is, at best, not very well understood, and at worst, leads to unexpected security-related concerns for those that do not understand the ramifications of using it.

We no longer have consensus for the feature (this is the new information that the security disclosure has highlighted). At a minimum, we need to poll the group again to see if @vocab has the support it needs to remain in the VCDM v2 base context.

There are additional proposal options, which include:

We'll gather feedback in this issue and then implement whatever achieves consensus.

OR13 commented 4 days ago

It could be that given the security security issues with complex contexts, the base context should be reduced to only the terms and key words necessary for the "claims data model", and not include the "securing specific" data model claims, such as:

And lastly:

I put @vocab last because its not clear to me if data integrity implementers (or json-ld in community in general) agree on the purpose of placing this keyword in a context.

IIRC, the VCWG originally added it because there was significant interest in "not requiring JSON-LD + RDF processing", in order to issue or verify credentials... but when you don't otherwise mandate valid JSON-LD + RDF, such as through a normative statement like:

The JSON-LD claimset MUST canonicalize to the same application/n-quads by the issuer and the verifier.

There is a chance that the verifier might process the claims differently than the issuer intended, and more specifically, implementers of did resolvers and credential graphs (network diagrams), observed that without a default @vocab much of the RDF that you might wish to analyze simply exploded at the point of analysis, when issuers signed with securing mechanisms that don't require valid RDF, but then verifiers expected for it to be produced later.

This RFC from the IAB is a much better take on this topic: https://datatracker.ietf.org/doc/rfc9413/

Time and experience show that negative consequences to interoperability accumulate over time if implementations silently accept faulty input. This problem originates from an implicit assumption that it is not possible to effect change in a system the size of the Internet. When one assumes that changes to existing implementations are not presently feasible, tolerating flaws feels inevitable.

The problem @vocab was targeted at was addressing these "silent faults", but they are originally introduced by the split in interpretation of what the claimset is.

If the claimset is JSON-LD, a fault can occur in the JSON processing (invalid json syntax, exceeded max depth, bad unicode handling, etc).

If the claimset is RDF, a fault can occur in the processing of the RDF concrete serialization (like application/rdf+xml, or application/n-quads).

If you say that a claimset is JSON-LD that MUST always produce valid RDF, you get faults from both categories. If you say that a claimset is JSON-LD that MAY produce valid RDF, you still get faults from both categories, you just might ignore the RDF faults because they are expected, because of the normative framing.

I support re-evaluating including @vocab in the base context, I'll share my view of how to improve the specification, but I am not a W3C member, and the WG will need to decide how they want to handle this sort of thing.

Option 1: Keep Saying No RDF Processing is Required

Decide if you want silent failures to come in the form of "bad term definitions...@vocab stays" or "no rdf produced... @vocab removed".

Option 2: Make RDF Processing Mandatory

Drop @vocab, add normative text that explains that RDF needs to be producible regardless of the securing mechanism, and optionally move the securing mechanism details to separate contexts per best practices.

This way, you are clear to consumers with normative language that valid "high quality" RDF is expected for every valid instance of the data model, and you have given them normative guidance "you MUST understand this, in order to implement the spec properly", on how to produce "high quality RDF".

I'm in favor of Option 2. The reason is that I have observed over the years lots of confusion regarding JSON-LD, including as described here: https://tess.oconnor.cx/2023/09/polyglots-and-interoperability

I feel that without stronger guidance on how JSON-LD is expected to be processed in a dependent specification, like Decentralized Identifier or Verifiable Credentials, the ambiguity creates an open wound that festers and never heals.

It leads to conversations with customers and partners that sound like: "You don't have to look at the RDF, but if you don't a verifier might come complain to you in the future about what you issued", or "You will need a stricter regulator profile to ensure JSON-LD produces RDF, because the base specification does not actually ensure this property"....

IMO, @vocab is just a symptom of the underlying problem, there is no consensus on mandatory processing of RDF... Its better to fix that in specification text, than to hide it in details of a JSON-LD keyword.

dlongley commented 4 days ago

I think we should do all of the following:

  1. Remove @vocab from the core context.
  2. For the "Getting Started" section, create a "development context", which might just be the examples/v2 context.
  3. Strongly advise against the use of @vocab in a production setting (but still allow it). [Note: I think we should say that it cannot offer term protection, so use of the JSON-LD compaction API is recommended prior to the consumption of documents with contexts that use @vocab.]

As for this point:

Create an "issuer-defined" context that moves the @vocab declaration to that document (for those that want to continue to create/use "private term" VCs).

I think "issuer-defined" terms (and "private claims") are a footgun in the global, three-party model, so my preference is not to define a context at all with such an @vocab value. This approach doesn't prevent someone else from defining such a context (we can't prevent this), but we don't need to endorse that approach.

I think "private claims" are the actual basis of any coherent "polyglot problem", as they are ambiguous on a global scale. These sorts of claims are only remotely sensible in a two-party model where there is an assumption of a tight coupling between the issuer and verifier, and the holder functions not as a fully independent actor, but as a transporter of opaque envelopes of information.

I think endorsing this concept in our core context was a mistake as it encourages people to make assumptions based on the historically ever-familiar two-party model; a model that isn't applicable here. These assumptions can result in a number of problems including, but not limited to, making it more difficult for general purpose wallets to help holders make choices, unduly incentivizing centralization in the marketplace, failing to understand document contexts prior to consuming information, and harming privacy by requiring permission from the issuer to express the same information in different ways.

aniltj commented 4 days ago

Remove @vocab from the core context.

+1

For the "Getting Started" section, create a "development context" ...

+1 particularly if the existence of that development context is something I can test for (and prevent via an implementation profile if need be)... and drop to the floor if it exists in production.

Strongly advise against the use of @vocab in a production setting (but still allow it). [Note: I think we should say that it cannot offer term protection, so use of the JSON-LD compaction API is recommended prior to the consumption of documents with contexts that use @vocab.]

I favor not using undefined terms and banning the use of @vocab from production implementations, so will wait to see how strong the "strongly advice against" language is, but agree that this is a good step forward.

For those who are not planning on using JSON-LD aware API's, should there also not be clear guidance provided that since the VCDM v2 data model is JSON-LD compact form, you need to have checks in your processing logic to catch the equivalent errors that a JSON-LD aware API will catch?

kimdhamilton commented 4 days ago

My votes:

longpd commented 4 days ago

Concurrence with:

tplooker commented 4 days ago

As someone who was originally in favour of having @vocab in the core context but also the author of the reported security vulnerabilities cited I'd just like to clarify my POV on this issue.

1) @vocab is a broadly useful feature with respect to JSON-LD, something that hasn't changed with the reporting of this security vulnerability. 2) My position around having @vocab in the core context for developer ease of use hasn't changed I believe that to be important. However its not a hill I am willing to die on any longer. 2) What has become clear is a flaw / design issue with JSON-LD which means the term protection feature offered by @protected doesn't extend to terms defined by @vocab

In my opinion we should be focusing on the root cause of the issue here, which is fixing how @vocab can be used in data integrity, because simply removing it from the core context doesn't mean it won't be used. If it were fixed then many of the arguments in this thread about having @vocab in the core context or not would be less relevant, because using @vocab would be safe.

aniltj commented 3 days ago

@vocab is a broadly useful feature with respect to JSON-LD ...

Agree with @tplooker on this.

However, also believe that having @vocab in the base context blinds developers to its existence, and promotes its misuse.

An option that can serve both perspectives is the use of @vocab for development and testing purposes via a "development" or "secondary" context that a developer has to explicitly and with full awareness use.

iherman commented 1 day ago

The issue was discussed in a meeting on 2024-07-03

View the transcript #### 1.5. Re-evaluate support for `@vocab` in base VCDM v2 context (issue vc-data-model#1514) _See github issue [vc-data-model#1514](https://github.com/w3c/vc-data-model/issues/1514)._ **Brent Zundel:** now we get to talk about 1514. Re-evaluate support for `@vocab` in base VCDM v2 context. coming out of a conversation in the Data Integrity spec. Some folks are suggesting there is a critical vulnerability. This could be a mitigation. _See github issue [vc-data-integrity#272](https://github.com/w3c/vc-data-integrity/issues/272)._ **Manu Sporny:** the discussion in the DI spec asserts a number of things...one is a realization that some people do not understand how `@vocab` works. because of that it has been misinterpreted and misused in that security disclosure. this discussion has led some to change their position on adding `@vocab` to the base context. … the issue asserts we should remove `@vocab` from the base context. still up to us to decide how it could be used, if at all. the spec doesn't say 'don't use it in production' - folks in the thread think it must not be used in production (MUST vs SHOULD). how do we enforce that? should we? there are legitimate uses of `@vocab`/@base in production. … there is enough here to raise a PR after we discuss this a bit more on the call today. **Ivan Herman:** if `@vocab` must not be used that would require all participant parties to check that. that means off-the-shelf LD checkers cannot do this, since it is valid LD. > *Dave Longley:* +1 that we consider some language changes but not add a MUST NOT; any verifier must understand the contexts it consumes information from anyway, and they can only allow list contexts that don't include `@vocab` (so long as `@vocab` is removed from the core context). **Manu Sporny:** you are right. there are some LD processors considering putting in a feature around this. I don't know if there is support for pulling this into our spec. There are legitimate uses of `@vocab` in production. Example: if the last `@vocab` in a context array, and your application knows that, ... it could be fine to use `@vocab` if you order it properly and there are other similar scenarios. … feels like we're closing off a bunch of use cases for no real reason. the current security disclosure specifically did not do checks that we highlight in the spec. do not think we'll get consensus. most we'll see is a 'should not' or strongly discourage it unless you know what you're doing. **Dave Longley:** I tend to agree. a MUST NOT is a bridge too far. I do think removing `@vocab` from the base context is a good idea. any context should be vetted, verifiers do not need to accept with `@vocab` if they vet the core context (and we remove it). **Michael Jones:** I was talking with Orie about this. The statements he made...he has a slight preference for always getting to RDF even if as a result of `@vocab` terms. If it is removed, then removal should mean terms are interpreted as JSON not RDF. **Ivan Herman:** trying to make clear what I understand the proposal to be. 1 - remove from the core context. in parallel 2 - reinforce text to say 'don't use that if you can avoid it'. I agree with both proposals. > *Manu Sporny:* yes, your understanding of the proposal is correct, Ivan. **Ivan Herman:** to Mike - I do not understand everything Orie is stating. I know he has this opinion that everything should be done on RDF only. I do not want to get into this, and not the right person to discuss this (RDF bias). His statement that we should treat it as JSON...I do not understand what he means. **Dave Longley:** we decided a while ago that VC 2.0 uses LD compacted form. That requires that you understand the @context field. Not something you can just ignore. That makes things simple. We can clarify more if we need to do so. When you understand that...it prevents these problems from being raised. **Gabe Cohen:** My main concern was to reduce the complexity on implementers that are more LD-averse, and I'm afraid that removing vocab increases the burden on implementers. I can see the arguments for using LD and understanding what its doing, but like the convenience for `@vocab` provided for those that wanted the feature. Is there a middle ground here? **Michael Jones:** I understand what Orie is saying -- we get a mapping for all terms that do not appear in context entries. This is why we added it. As an engineering mechanism I still think it's valuable. I am prone to leaving it alone. > *Ivan Herman:* +1 to selfissued, I understand now what Orie meant. Thx. **Manu Sporny:** We cannot leave it alone anymore - there is no support from the WG. We can figure out what to do about it. Gabe you asked - is there middle ground here? Yes - I think that's what's being proposed. The section we had said 'don't worry, just use the base context' - that section can be updated to say - use these two contexts: the base and examples context since it has `@vocab`. Can work until they're ready for 'production'. … IF they really want to use `@vocab` we can provide a template with an `@vocab` file..that is not a big ask. > *Dave Longley:* +1 to that plan, but none of that changes the fact that everyone must check the `@context` field, you can't ignore it (and the spec already says this). **Manu Sporny:** LD-averse people can continue to use the mechanism, we can continue to strongly recommend they don't do that. one of the negotiations around vocab ... we were concerned that people that were LD-averse would split the group and start competitive work at IETF and negatively impact both communities...that happened. So, that weakens the argument to have `@vocab` at all. … we have said you do not need to use an LD processor, use a simplified set of rules, said just check the context array and make sure you're OK with the contents, ... but there's only so much we can do. if developers are not going to use the spec since publishing a single context with an `@vocab` definition is too difficult, then I don't know we need to cater to those developers anymore. **Brent Zundel:** it sounds like we have a proposed path forward. remove `@vocab` from the core context. create an example/experimental context with `@vocab` for test purposes. did not hear anyone say no. a possible 3rd step - if you want to keep using undefined terms, then you can publish your own `@vocab` context. … let's spend one more minute and then move on to controller document. **Anil John:** as someone implementing using DI and JOSE, using LD v2 using compact form is a credential for us. there will be no undefined terms in how we are creating credentials. all credentials we create will have clearly defined terms in the context..and can verify that the terms are coming from us. … I am sympathetic that `@vocab` provides value. I disagree with having it in the base context. Developers become blind to it. The position that splits the difference (`@vocab` is bad vs `@vocab` adds value), we can add a secondary context that developers can add to note there are undefined terms. > *Dave Longley:* +1 to Anil, `@vocab` is useful in a closed setting like development, but it creates conflicts and problems in the general ecosystem. **Anil John:** we support removing `@vocab` from the base context. support in a 2nd context for development purposes...so developers have to be aware of it...that's fine. **Michael Jones:** is it the case now that conforming JSON-LD implementations will throw an error if there are undefined terms? **Manu Sporny:** yes...not all of them but we can force them to. > *Dave Longley:* *conforming* implementations will throw an error, yes. **Michael Jones:** thanks that is good data. responding to Brent's summary that no one has spoken against removal. I have spoken against removal. I would like to have this go out to some people - like Tobias - who are in different time zones, before making a decision. … I would like more discussion before deciding on this call. **Dmitri Zagidulin:** Responding to Manu's point about not worrying about LD-averse implementers. Not quite the case...I know of multiple implementers that are new to LD. any removal of friction, such as including an `@vocab` (though I understand the concern) -- let's not discount that audience. > *Manu Sporny:* Agree that we want to remove as much friction as possible for people that are new to LD (and even people that regularly use LD). > *Dave Longley:* +1 to not discount, but to move `@vocab` to examples and new developer space. > *Dave Longley:* +1 to Manu. **Dmitri Zagidulin:** we want to remove friction. we could recommend an inline `@vocab`, which is an option. > *Ivan Herman:* +1 to fall back on inline `@vocab`. > *Gabe Cohen:* +1 if we can inline `@vocab` I'm less opposed.. **Brent Zundel:** I open to reaching out to MATTR and others. Not sure how much they should dictate group direction. > *Dave Longley:* yeah, i don't see why we can't inline it -- verifiers in production would reject it if they haven't allowlisted it. > *Phillip Long:* +1 to in-line `@vocab` - which sounds like a good compromise.