Open msporny opened 4 days ago
It could be that given the security security issues with complex contexts, the base context should be reduced to only the terms and key words necessary for the "claims data model", and not include the "securing specific" data model claims, such as:
And lastly:
I put @vocab
last because its not clear to me if data integrity implementers (or json-ld in community in general) agree on the purpose of placing this keyword in a context.
IIRC, the VCWG originally added it because there was significant interest in "not requiring JSON-LD + RDF processing", in order to issue or verify credentials... but when you don't otherwise mandate valid JSON-LD + RDF, such as through a normative statement like:
The JSON-LD claimset MUST canonicalize to the same
application/n-quads
by the issuer and the verifier.
There is a chance that the verifier might process the claims differently than the issuer intended, and more specifically, implementers of did resolvers and credential graphs (network diagrams), observed that without a default @vocab
much of the RDF that you might wish to analyze simply exploded at the point of analysis, when issuers signed with securing mechanisms that don't require valid RDF, but then verifiers expected for it to be produced later.
This RFC from the IAB is a much better take on this topic: https://datatracker.ietf.org/doc/rfc9413/
Time and experience show that negative consequences to interoperability accumulate over time if implementations silently accept faulty input. This problem originates from an implicit assumption that it is not possible to effect change in a system the size of the Internet. When one assumes that changes to existing implementations are not presently feasible, tolerating flaws feels inevitable.
The problem @vocab
was targeted at was addressing these "silent faults", but they are originally introduced by the split in interpretation of what the claimset is.
If the claimset is JSON-LD, a fault can occur in the JSON processing (invalid json syntax, exceeded max depth, bad unicode handling, etc).
If the claimset is RDF, a fault can occur in the processing of the RDF concrete serialization (like application/rdf+xml, or application/n-quads).
If you say that a claimset is JSON-LD that MUST always produce valid RDF, you get faults from both categories. If you say that a claimset is JSON-LD that MAY produce valid RDF, you still get faults from both categories, you just might ignore the RDF faults because they are expected, because of the normative framing.
I support re-evaluating including @vocab
in the base context, I'll share my view of how to improve the specification, but I am not a W3C member, and the WG will need to decide how they want to handle this sort of thing.
Decide if you want silent failures to come in the form of "bad term definitions...@vocab
stays" or "no rdf produced... @vocab
removed".
Drop @vocab
, add normative text that explains that RDF needs to be producible regardless of the securing mechanism, and optionally move the securing mechanism details to separate contexts per best practices.
This way, you are clear to consumers with normative language that valid "high quality" RDF is expected for every valid instance of the data model, and you have given them normative guidance "you MUST understand this, in order to implement the spec properly", on how to produce "high quality RDF".
I'm in favor of Option 2. The reason is that I have observed over the years lots of confusion regarding JSON-LD, including as described here: https://tess.oconnor.cx/2023/09/polyglots-and-interoperability
I feel that without stronger guidance on how JSON-LD is expected to be processed in a dependent specification, like Decentralized Identifier or Verifiable Credentials, the ambiguity creates an open wound that festers and never heals.
It leads to conversations with customers and partners that sound like: "You don't have to look at the RDF, but if you don't a verifier might come complain to you in the future about what you issued", or "You will need a stricter regulator profile to ensure JSON-LD produces RDF, because the base specification does not actually ensure this property"....
IMO, @vocab
is just a symptom of the underlying problem, there is no consensus on mandatory processing of RDF... Its better to fix that in specification text, than to hide it in details of a JSON-LD keyword.
I think we should do all of the following:
@vocab
from the core context.@vocab
in a production setting (but still allow it). [Note: I think we should say that it cannot offer term protection, so use of the JSON-LD compaction API is recommended prior to the consumption of documents with contexts that use @vocab
.]As for this point:
Create an "issuer-defined" context that moves the @vocab declaration to that document (for those that want to continue to create/use "private term" VCs).
I think "issuer-defined" terms (and "private claims") are a footgun in the global, three-party model, so my preference is not to define a context at all with such an @vocab
value. This approach doesn't prevent someone else from defining such a context (we can't prevent this), but we don't need to endorse that approach.
I think "private claims" are the actual basis of any coherent "polyglot problem", as they are ambiguous on a global scale. These sorts of claims are only remotely sensible in a two-party model where there is an assumption of a tight coupling between the issuer and verifier, and the holder functions not as a fully independent actor, but as a transporter of opaque envelopes of information.
I think endorsing this concept in our core context was a mistake as it encourages people to make assumptions based on the historically ever-familiar two-party model; a model that isn't applicable here. These assumptions can result in a number of problems including, but not limited to, making it more difficult for general purpose wallets to help holders make choices, unduly incentivizing centralization in the marketplace, failing to understand document contexts prior to consuming information, and harming privacy by requiring permission from the issuer to express the same information in different ways.
Remove @vocab from the core context.
+1
For the "Getting Started" section, create a "development context" ...
+1 particularly if the existence of that development context is something I can test for (and prevent via an implementation profile if need be)... and drop to the floor if it exists in production.
Strongly advise against the use of @vocab in a production setting (but still allow it). [Note: I think we should say that it cannot offer term protection, so use of the JSON-LD compaction API is recommended prior to the consumption of documents with contexts that use @vocab.]
I favor not using undefined terms and banning the use of @vocab from production implementations, so will wait to see how strong the "strongly advice against" language is, but agree that this is a good step forward.
For those who are not planning on using JSON-LD aware API's, should there also not be clear guidance provided that since the VCDM v2 data model is JSON-LD compact form, you need to have checks in your processing logic to catch the equivalent errors that a JSON-LD aware API will catch?
My votes:
@vocab
from the base contextConcurrence with:
As someone who was originally in favour of having @vocab
in the core context but also the author of the reported security vulnerabilities cited I'd just like to clarify my POV on this issue.
1) @vocab
is a broadly useful feature with respect to JSON-LD, something that hasn't changed with the reporting of this security vulnerability.
2) My position around having @vocab
in the core context for developer ease of use hasn't changed I believe that to be important. However its not a hill I am willing to die on any longer.
2) What has become clear is a flaw / design issue with JSON-LD which means the term protection feature offered by @protected
doesn't extend to terms defined by @vocab
In my opinion we should be focusing on the root cause of the issue here, which is fixing how @vocab
can be used in data integrity, because simply removing it from the core context doesn't mean it won't be used. If it were fixed then many of the arguments in this thread about having @vocab
in the core context or not would be less relevant, because using @vocab
would be safe.
@vocab is a broadly useful feature with respect to JSON-LD ...
Agree with @tplooker on this.
However, also believe that having @vocab in the base context blinds developers to its existence, and promotes its misuse.
An option that can serve both perspectives is the use of @vocab for development and testing purposes via a "development" or "secondary" context that a developer has to explicitly and with full awareness use.
The issue was discussed in a meeting on 2024-07-03
Previously, the VCWG decided to define a
@vocab
value in the base context (see https://github.com/w3c/vc-data-model/issues/953). Recently, a security disclosure (which is still under debate) has resulted in a number of individuals that had previously been in support of defining an@vocab
in pulling their support for the feature since it is, at best, not very well understood, and at worst, leads to unexpected security-related concerns for those that do not understand the ramifications of using it.We no longer have consensus for the feature (this is the new information that the security disclosure has highlighted). At a minimum, we need to poll the group again to see if
@vocab
has the support it needs to remain in the VCDM v2 base context.There are additional proposal options, which include:
examples/v2
context.@vocab
in a production setting (but still allow it).@vocab
in any production setting (and implement normative specification text and tests that enforce the behaviour).@vocab
declaration to that document (for those that want to continue to create/use "private term" VCs).We'll gather feedback in this issue and then implement whatever achieves consensus.