w3c / vc-data-integrity

W3C Data Integrity Specification

https://w3c.github.io/vc-data-integrity/

Other

40 stars 18 forks source link

Multiple significant security vulnerabilities in the design of data integrity #272

Open tplooker opened 2 weeks ago

tplooker commented 2 weeks ago

The following issue outlines two significant security vulnerabilities in data integrity.

For convenience in reviewing the below content here is a google slides version outlining the same information.

At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.

In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.

The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the @protected keyword doesn't cover terms that are defined using the @vocab and @base keywords. This means any terms defined using @vocab and @base are vulnerable to term redefinition.

The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the @context element of the JSON-LD document. The fact that the @context element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.

Please see the attached presentation for resolutions to this issue we have explored.

In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the @context element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.

I'm also happy to present this analysis to the WG if required.

dlongley commented 2 weeks ago

I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs.

Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism.

We might want to add some examples to the specification that show that the information in documents can be expressed in one context and transformed into another. This could include showing an incoming document that is expressed using one or more contexts that the consumer does not understand, which can then be transformed using the JSON-LD API to a context that is trusted and understood. This would also help highlight the power of protection mechanisms that enable this kind of transformation.

For example, a VC that includes terms that are commonly consumed across many countries and some that are region specific. By using the JSON-LD API, a consumer that only understands the global-only terms can apply such a context to ensure that the terms they understood will appear as desired and other region-specific terms are expressed as full URLs, even when they do not understand or trust the regional context. All of this can happen without losing the ability to check the authenticity of the document.

We can also highlight that simpler consumers continue to be free to outright reject documents that are not already presented in the context that they trust and understand, no matter their authenticity.

tplooker commented 2 weeks ago

I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs.

The fundamental point of digital signatures is to reduce the information that needs to be trusted prior to verification. Most modern technologies e.g SD-JWT, mDocs, JWT and COSE and JOSE at large do this successfully meaning a relying party only needs to trust a public key prior to attempting to verify the signature of an otherwise untrusted payload. If the signature check fails, the payload can be safely discarded without undue expense.

The problem with data integrity is that this assumption is not the same. In essence the relying party doesn't just need the public key of the issuer/signer, but also all possible JSON-LD context entries that issuer may or may not use, if any of these are corrupted, manipulated or untrusted ones injected, the attacks highlighted in this issue become possible. Whether it is even possible to share these contexts appropriately at scale is another question, but these attacks demonstrate at a minimum that an entirely unique class of vulnerabilities exist because of this design choice.

Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism.

The point im making is not about whether one should understand the context of a message it has received or not, its about when it should attempt to establish this context. Doing this prior to validating the signature is dangerous and leads to these vulnerabilities.

For instance a JSON-LD document can be signed with a plain old JWS signature (like in JOSE COSE), once the signature is validated one can then process it as JSON-LD to understand the full context, if they so wish. The benefit of this approach is that if the JSON-LD context have been manipulated (e.g the context of the message), the relying party will have safely discarded the message before even reaching this point because the signature check will have failed. Data integrity on the other hand requires this context validation to happen as a part of signature verification thus leading to these issues.

selfissued commented 2 weeks ago

Another take on this is that Data Integrity signing methods that sign the canonicalized RDF derived from JSON-LD, rather than the JSON-LD itself, enable multiple different JSON-LD inputs to canonicalize to the same RDF. The JSON-LD itself isn't secured - only RDF values derived from it. If only the derived RDF values were used by code, it might not be a problem, but in practice, code uses the unsecured JSON-LD values - hence the vulnerabilities.

ottonomy commented 2 weeks ago

In the example where the firstName and middleName plaintext properties are swapped, what should the verifier's behavior be? I don't think it should just be to verify the credential, whatever type it might be and then look at the plaintext properties within it that used @vocab-based IRIs. If I were writing this verifier, I would also ensure the @context matched my expectations, otherwise I wouldn't be sure that the properties of credentialSubject I was looking for actually meant the things that I expected them to mean.

If they were trying to depend on a credential of a certain type that expressed a holder's first name and middle name, it would not be a good idea to miss a check like this. Don't accept properties that aren't well-@protected in expected contexts. This is an additional cost that comes with processing JSON-LD documents like VCDM credentials, but it's not a step that should be skipped, because you're right that skipping it might open an implementer up to certain vulnerabilities.

Approaches that work:

Verify the @context matches your expectations, such as including only known context URLs to contexts appropriate for the credential type that use @protected and only use explicitly defined terms.
OR, use the JSON-LD tools to compact the credential into the context you expect before relying on plaintext property names.

Communities developing and using new credential type specifications benefit from defining a good @context with appropriately @protected terms. @vocab is ok for experimentation but not so great for production use cases. We don't really have a huge number of credential types yet, but hopefully as the list grows, the example contexts established for each of the good ones makes for an easy-to-follow pattern.

OR13 commented 2 weeks ago

schema.org and google knowledge graph both use @vocab.

https://developers.google.com/knowledge-graph

The problem is not JSON-LD keywords in contexts, the problem is insecure processing of attacker controlled data.

If you want to secure RDF, or JSON-LD, it is better to sign bytes and use media types.

You can sign and verify application/n-quads and application/ld+json, in ways that are faster and safer.

W3C is responsible for making the web safer, more accessible and more sustainable.

Data integrity proofs are less safe, harder to understand, and require more CPU cycles and memory to produce and consume.

They also create a culture problem for RDF and JSON-LD by attaching a valuable property which many people care deeply about (semantic precision and shared global vocabularies), with a security approach that is known to be problematic, and difficult to execute safely.

These flaws cannot be corrected, and they don't need to be, because better alternatives already exist.

W3C, please consider not publishing this document as a technical recommendation.

msporny commented 2 weeks ago

2024-05-08 MATTR Responsible Disclosure Analysis

On May 8th 2024, MATTR provided a responsible security disclosure to the Editor's of the W3C Data Integrity specifications. A private discussion ensued, with this analysis of the disclosure provided shortly after the disclosure and a public release date agreed to (after everyone was done with the conferences they were attending through May and June). The original response, without modification, is being included below (so language that speaks to "VC Data Model" could be interpreted as "VC Data Integrity" as the original intent was to file this issue against the VC Data Model specification).

The disclosure suggested two separate flaws in the Data Integrity specification:

"The unprotected term redefinition vulnerability"
"The @context substitution vulnerability"

The Editors of the W3C Data Integrity specification have performed an analysis of the responsible security disclosure and provide the following preliminary finding:

Both attacks are fundamentally the same attack, and the attack only appears successful because the attack model provided by MATTR presumes that verifiers will allow fields to be read from documents that use unrecognized @context values. Two documents with different @context values are different documents. All processors (whether utilizing JSON-LD processing or not) should treat the inbound documents as distinct; the software provided by MATTR failed to do that. Secure software, by design, does not treat unknown identifiers as equivalent.

That said, given that a credential technology company such as MATTR has gone so far as to report this as a vulnerability, further explanatory text could be added to the VC Data Model specification that normatively state that all processors should limit processing to known and trusted context identifiers and values, such that developers do not make the same mistake of treating documents with differing @context values as identical prior to verification.

The rest of this document contains a more detailed preliminary analysis of the responsible disclosure. We thank MATTR for the time and attention put into describing their concerns via a responsible security disclosure. The thorough explanation made analysis of the concerns a fairly straightforward process. If we have made a mistake in our analysis, we invite MATTR and others to identify the flaws in our analysis such that we may revise our findings.

Detailed Analysis

A JSON-LD consumer cannot presume to understand the meaning of fields in a JSON-LD document that uses a context that the consumer does not understand. The cases presented suggest the consumer is determining the meaning of fields based on their natural language names, but this is not how JSON-LD works, rather each field is mapped to an unambiguous URL using the JSON-LD context. This context MUST be understood by the consumer; it cannot be ignored.

A verifier of a Verifiable Credential MUST ensure that the context used matches an exact well-known @context value or MUST compact the document using the JSON-LD API to a well-known @context value before further processing the data.

Suggested Mitigation 1

Add a paragraph to the Data Integrity specification that mentions this and links to the same section in the Verifiable Credentials specification to help readers who are not familiar with JSON-LD, or did not read the JSON-LD specification, to understand that `@context` cannot be ignored when trying to understand the meaning of each JSON key. Additional analogies could be drawn to versioning to help developers unfamiliar with JSON-LD, e.g., "The `@context` field is similar to a `version` for a JSON document. You must understand the version field of a document before you read its other fields."

The former can be done by using JSON schema to require a specific JSON-LD shape and specific context values. This can be done prior to passing a document to a data integrity implementation. If contexts are provided by reference, a document loader can be used that resolves each one as "already dereferenced" by returning the content based on installed context values instead of retrieving them from the Web. Alternatively, well-known cryptographic hashes for each context can be used and compared against documents retrieved by the document loader over the Web. For this approach, all other JSON-LD documents MUST be rejected if they do not abide by these rules. See Type-Specific Credential Processing for more details on this:

https://www.w3.org/TR/vc-data-model-2.0/#type-specific-credential-processing.

This former approach is less powerful than using the JSON-LD Compaction API because it requires more domain-specific knowledge to profile down. However, it is still in support of decentralized extensibility through use of the JSON-LD @context field as a decentralized registry, instead of relying on a centralized registry. Decentralized approaches are expected to involve a spectrum of interoperability and feature use precisely because they do not require a one-size fits all approach.

Applying these rules to each case presented, for case 1:

A verifier that does not use the JSON-LD API and does not recognize the context URL, https://my-example-context.com/, will reject the document.

A verifier that does not use the JSON-LD API and does recognize the context URL, https://my-example-context.com/, will not conflate the natural language used for the JSON keys with their semantics. Instead, the verifier will appropriately use the semantics (that happens to be the opposite of the natural language used in the JSON keys) that the issuer intended, even though the JSON keys have changed.

A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context, and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended.

For case 2:

A verifier that does not use the JSON-LD API and does not recognize the attacker-provided context URL, https://my-malicious-modified-context.com/, will reject the document.

A verifier that does not use the JSON-LD API and does recognize the attacker-provided context URL, https://my-malicious-modified-context.com/, will not conflate the natural language used for the JSON keys with their semantics. Instead, the verifier will appropriately use the semantics (that happens to be the opposite of the natural language used in the JSON keys) that the issuer intended, even though the JSON keys have changed.

A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context (and optionally, https://my-original-context.com), and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended.

Note: While the disclosure suggests that the JSON-LD @protected feature is critical to this vulnerability, whether it is used, or whether a Data Integrity proof is used to secure the Verifiable Credential, is orthogonal to ensuring that the entire @context value is understood by the verifier. For clarity, this requirement stands even if an envelope-based securing mechanism focused on syntax protection were used to ensure authenticity of a document. Misunderstanding the semantics of an authentic message by ignoring its context is always a mistake and can lead to unexpected outcomes.

Comparison to JSON Schema

The scenarios described are identical in processing systems such as JSON Schema where document identifiers are used to express that two documents are different. A JSON document with differing $schema values would be treated as differing documents even if the contained data appeared identical.

Original document

{"$schema": "https://example.com/original-meaning",
 "firstName": "John"}

New or modified document

{"$schema": "https://example.com/new-meaning",
 "firstName": "John"}

Any document processor whether utilizing JSON Schema processing or not would rightly treat these two documents as distinct values and would seek to understand their values equivalence (or lack of it) prior to processing their contents. Even consuming a document that is recognized as authentic would be problematic if the $schema values were not understood.

Original meaning/schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/original-meaning",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The name by which a person is generally called: 'given name'",
      "type": "string"
    }
  }
}

New meaning/schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/new-meaning",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The name spoken first in Japan: typically a surname",
      "type": "string"
    }
  }
}

Demonstration of Proper Implementation

The attack demonstration code provided adds the unknown modified/malicious contexts to the application code's trusted document loader. A valid application should not do this and removing these lines will cause the attack demonstrations to no longer pass:

documentLoader.addStatic("https://my-example-context.com/", modifiedContext)

https://gist.github.com/tplooker/95ab5af54a141b69b55d0c2af0bc156a#file-protected-term-redefinition-attack-js-L38

To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/93c0ba17b25e500d72c1ad131fe7e869

documentLoader.addStatic("https://my-malicious-modified-context.com/", modifiedContext)

https://gist.github.com/tplooker/4864ffa2403ace5637b619620ce0c556#file-context-substitution-attack-js-L48

To see "Proof failed" when this line is commented out and the failure result is logged, see:

https://gist.github.com/dlongley/4fb032c422b77085ba550708b3615efe

Conclusion

While the mitigation for the misimplementation identified above is fairly straightforward, the more concerning thing, given that MATTR is knowledgeable in this area, is that they put together software that resulted in this sort of implementation failure. It demonstrates a gap between the text in the specification and the care that needs to be taken when building software to verify Verifiable Credentials. Additional text to the specification is needed, but may not result in preventing this sort of misimplementation in the future. As a result, the VCWG should probably add normative implementation text that will test for this form of mis-implementation via the test suite, such as injecting malicious contexts into certain VCs to ensure that verifiers detect and reject general malicious context usage.

Suggested Mitigation 2

Add tests to the Data Integrity test suites that are designed to cause verifiers to abort when an unknown context is detected to exercise type-specific credential processing.

OR13 commented 2 weeks ago

If you consider the contexts part of source code, then this sort of attack requires source code access or misconfiguration.

Validation of the attacker controlled content prior to running the data integrity suite, might provide mitigation, but at further implementation complexity cost.

Which increases the probability of misconfiguration.

A better solution is to verify the content before performing any JSON-LD (or other application specific) processing.

After verifying, schema checks or additional business validation can be performed as needed with assurance that the information the issuer intended to secure has been authenticated.

At a high level, this is what you want:

minimal validation of hints
key discovery & resolution
verification
validation
deeper application processing

Most data integrity suites I have seen do this instead:

deep application processing (JSON-LD / JSON Schema)
canonicalization (RDF)
verification
validation
deeper application processing

The proposed mitigations highlight, that these security issues are the result of a fundamental disagreement regarding authentication and integrity of data.

Adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc...

Any application processing that occurs prior to verification is a design flaw, doubling down on a design flaw is not an effective mitigation strategy.

filip26 commented 2 weeks ago

@OR13

adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc...

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

which is loop and simple string comparison. I don't see a reason for any of the exploits you have listed here except an implementer's incompetence.

Please can you elaborate how those exploits could be performed and provide a calculation, an estimation, how much this adds to complexity?

Thank you!

tplooker commented 2 weeks ago

@filip26, setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent".

Your specific pseduo-code is in-sufficient for at least the following reasons:

What is actually input into the cryptographic verification procedure for data integrity aren't URL's, its the contents behind those URL's. So to put it plainly, data integrity cannot by design ensure that the contents of the @context entries used to actually verify a credential are those that the issuer used, because they are not integrity protected by the issuers signature.
You have disregarded inline contexts, the @context array is not simply guaranteed to be an array of strings it may also include objects or "inline contexts".
Your check appears to imply ACCEPTED_CONTEXTS is a flat list of contexts acceptable for any issuer, this means if contexts from different issuers collide in un-expected ways and a malicious party knows this, they can manipulate pre-trusted @context values by the relying party without even having to inject or modify an existing @context. If I'm mistaken and you meant that ACCEPTED_CONTEXTS is an array of issuer specific accepted contexts, then please explain how this is accomplished in an interoperable manner and or how it would scale.

filip26 commented 2 weeks ago

@tplooker setting aside you are putting words in my mouth that I have not said which is quite rude and disrespectful ...

add 1. you are wrong, by ensuring data is processed with a context you accept (the URLs) you know what is behind those URLs, and how much you trust those URLs, and perhaps you have a static copy of the contexts. If you follow untrusted URLs then it's an implemters fault. Use a browser analogy. add 2. yeah, I've simplified that, an inline context is a bad practice add 3. your trust the URLs or not and based on the trust you proceed or not

PatStLouis commented 2 weeks ago

I was browsing through past issues related to this. This specific issue was raised to suggest adding @vocab in the base vcdm 2.0 context. It's my understanding that the authors of the Data Integrity spec were opposed to this. This is now being pointed as a direct security concern.

@tplooker given these new findings, would you revise your support since this was a bad recommendation introducing a security concern according to your disclosure?

OR13 commented 2 weeks ago

The URL for a context doesn't actually matter... In fact some document loaders will follow redirects when resolving contexts over a network (technically another misconfiguration).

Depending on the claims you sign, you may only detect a mismatch in the signature, when you attempt to sign a document that actually uses the differing part of the context.

Contexts are just like any other part of source code... Every single line of source code is a potential problem.

You often don't control what 3rd parties will consider the bytes of a context to be... It's a feature, that's been turned into a defect by where it was placed.

"It verified for me, must be a problem in your document loader."

"I thought I would be able to fix it in only a few hours, but it took me 2 days and delayed our release"

"I finally figured out how data integrity proofs work, thanks for letting me spend all week on them"

I've paired with devs and shown them how to step through data integrity proofs, dumping intermediate hex values and comparing against a "known good implementation", only later to learn the implementation had a bug...

Misconfiguration is common in complex systems.

I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order.

Authentication of json -> json web signatures Specification of json structure -> json schemas Integrity protection of files -> hashes Semantic mapping for json -> JSON-LD

The essence of a recommendation, is that you believe there isn't a better alternative.

filip26 commented 2 weeks ago

@OR13 I'm sorry but don't see it. You mention two issues: misconfiguration and bugs. Well, we have tests, certification, etc. Those issues are endemic to any software applications but we don't call all the software vulnerable because of just a possibility that there might be a bug but after we find a bug.

Misconfiguration is common in complex systems.

I would really like to see the complexity estimated. I guess we are seeing a very different picture.

I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order.

Please let's be factual, what experts, what was recommended, etc. In EU when press article starts with a title "American scientists have ... " everyone stops reading it (they add the American to make it credible ;)

OR13 commented 2 weeks ago

@PatStLouis you raise an excellent point regarding default vocabularies.

It's never too late to change what's in a context (joke).

This working group cannot prevent anyone else from adding a context that includes a vocab.

You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context.

If json compatibility isn't a requirement, the working group can drop the vc-jose-cose spec and remove the vocab from the default context... This might even improve adoption of data integrity while clarifying that RDF is the claims format that W3C secures.

I've argued this point previously.

tplooker commented 2 weeks ago

I was browsing through past issues related to this. https://github.com/w3c/vc-data-model/issues/953 was raised to suggest adding @vocab in the base vcdm 2.0 context. It's my understanding that the authors of the Data Integrity spec were opposed to this. This is now being pointed as a direct security concern.

@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. @vocab is a broadly useful feature, that has not changed through disclosure of this vulnerability, what has become apparent is that JSON-LD is broken with regard to how this feature works. Simply removing @vocab from the vocabulary doesn't fix this issue it would be a band aid, what needs to be fixed is 1) JSON-LD with regard to how @vocab works with @protected and 2) more generally the @context entry needs to be integrity protected to prevent manipulation.

tplooker commented 2 weeks ago

Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting @vocab in the base vocabulary was a "bad recommendation". In actual reality it was also necessitated to fix an even worse issue with data integrity as documented here https://github.com/digitalbazaar/jsonld.js/issues/199 which lay around since 2017 un-patched, until we started a contribution for a fix in 2021, when we discovered it https://github.com/digitalbazaar/jsonld.js/pull/452. Personally I believe removing @vocab from the core context will likely re-introduce this issue for JSON-LD processors that aren't handling these relative IRI's correctly.

Furthermore, if others in the WG knew about this issue, specifically that @vocab didn't work with @protected and chose not to disclose it when discussing this proposal, then that is even more troublesome.

msporny commented 2 weeks ago

@tplooker wrote:

setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent".

@tplooker wrote:

Furthermore, if others in the WG knew about this issue, specifically that @vocab didn't work with @protected and chose not to disclose it when discussing this proposal, then that is even more troublesome.

Please stop insinuating that people are acting in bad faith.

Now might be a good time to remind everyone in this thread thread that W3C operates under a Code of Ethics and Professional Conduct that outlines unacceptable behaviour. Everyone engaging in this thread is expected to heed that advice in order to have a productive discussion that can bring this issue to a close.

veikkoeeva commented 2 weeks ago

From an implementer perspective maybe adding an example that "should fail" could be a good thing. Something like at https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184084631 .

As an implementation "case experience", I implemented in .NET something that produces a proof like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6 the university crendetial and then also verifies it. It felt a bit tedious to find out what to canonicalize, hash and and sign to get a similar result. The code is or less private code still, but now that https://github.com/dotnetrdf/dotnetrdf/releases/tag/v3.2.0 and the canonicalization is publicly released, I might make something more public too. I still feel I need to go through this thread with more thought so I completely understand the issue at hand.

msporny commented 2 weeks ago

@veikkoeeva wrote:

From an implementer perspective maybe adding an example that "should fail" could be a good thing.

Yes, that is already the plan for the test suite in order to make sure that no conformant implementations can get through without ensuring that they refuse to generate a proof for something that drops terms, and/or, depending on the outcome of this thread, use @vocab to expand a term.

That's a fairly easy thing that this WG could do to ensure that this sort of implementation mistake isn't made by implementers. Again, we'll need to see how this thread resolves to see what actions we can take with spec language and test suites to further clarify the protections that we expect implementations to perform by default.

PatStLouis commented 2 weeks ago

@OR13

It's never too late to change what's in a context (joke).

I'm not suggesting a change, my goal is to understand why this recommendation was suggested in the first place and removing it is now listed as a remediation step to a security concern raised from the very same parties who suggested it.

This working group cannot prevent anyone else from adding a context that includes a vocab.

Correct, @vocab is a great feature for some use cases. I enjoy the feature for learning about jsonld, development and prototyping until I publish a proper context. I wouldn't use it in a production system (or at least I haven't found a use case that requires it).

Many protocols have features that can be unsecured depending how you use them. This doesn't make the protocol inherently flawed.

You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context.

Apologies if you misunderstood my statement, but my intention was not to report an architectural flaw.

@tplooker

@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. @vocab is a broadly useful feature, that has not changed through disclosure of this vulnerability, what has become apparent is that JSON-LD is broken with regard to how this feature works. Simply removing @vocab from the vocabulary doesn't fix this issue it would be a band aid, what needs to be fixed is 1) JSON-LD with regard to how @vocab works with @protected and 2) more generally the @context entry needs to be integrity protected to prevent manipulation.

Yes @vocab is a useful feature, but should it always be present? Nothing is stopping someone from using it, it's a feature (but shouldn't be a default behaviour). I would argue the same that the decision of adding an @vocab in the base context of the vcdm 2.0 is a band aid solution in itself, derived from a need for easier development process.

The Data Integrity spec provides hashes for their entries that verifiers can leverage while caching the content. AFAIK this is already a thing.

Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting @vocab in the base vocabulary was a "bad recommendation". In actual reality it was also necessitated to fix an even worse issue with data integrity as documented here digitalbazaar/jsonld.js#199 which lay around since 2017 un-patched, until we started a contribution for a fix in 2021, when we discovered it digitalbazaar/jsonld.js#452. Personally I believe removing @vocab from the core context will likely re-introduce this issue for JSON-LD processors that aren't handling these relative IRI's correctly.

Thank you for pointing out to these issues, I enjoy looking back at historical data from before my time in the space. As pointed earlier, some of the parties that made that recommendation are now recommending removing it as a remediation to a security concerned that they raised.

The use cases listed for this recommendation was for development purposes as described in #953. Furthermore, the private claims section of the jwt RFC reads as follows:

Private Claim Names are subject to collision and should be used with caution.

Enabling this by default does not sound like a good recommendation to me.

It's easy to setup a context file, it takes 5 minutes and a github account. If you are doing development, you can just include an @vocab object in your base context for the short term, why the recommendation to make it part of the VCDM 2.0 context?

Regardless this was already discussed by the group and the decision has been made.

The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol.

For concluding I find this disclosure valuable as I got to learn a bit more about json-ld and gives a great resource to demonstrate implementers how to properly conduct verification of credentials + issuers how to properly design a VC.

awoie commented 1 week ago

The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol.

I would actually classify those attacks as "Data Integrity Signature Wrapping" (DISW) attacks. They share many similarities with XML Signature Wrapping Attacks (XSW) that occurred in the past. Also, note that it is possible to use XML Signatures securely if appropriate mitigations are implemented correctly. The same holds true for DI. The question is where we would add requirements for those additional mitigations for Data Integrity Proofs (DI).

The VCDM uses relatedResource to protect the integrity of specific external resources, such as @context values referenced by URIs. While DI is primarily used with the W3C VCDM 2.0, other JSON-LD data models might be secured by DI in the future, such as ZCaps, so we cannot just rely on mitigations defined in the W3C VCDM 2.0. For this reason, I believe this mechanism for protecting the integrity of the @context definitions is actually the responsibility of DI itself, since those @context definitions are part of the canonicalization and signature creation/verification. It would mitigate DISW attacks by making @context definition integrity checking part of the signature verification process. In that case, a similar mechanism to relatedResource has to be defined in the DI specification, and making it mandatory would help verifiers and issuers avoid skipping certain checks when issuing and verifying DIs.

awoie commented 1 week ago

we are speaking about this pseudo-code
  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the ACCEPTED_CONTEXTS array. Consider the following simplified VC examples:

Example: VC using a base data model for all driving licenses

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
  ],
  "type": [ "VerifiableCredential", "DrivingLicense" ]
  "credentialSubject": {
    "canDrive": true
   }
}

Example: VC issued by DMV of Foo

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://foo.com/ns/dl/v1"
  ],
  "type": [ "VerifiableCredential", "DrivingLicense", "FooLicense" ]
  "credentialSubject": {
    "canDrive": true,
    "foo": true
   }
}

Example: VC issued by DMV of Bar

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://bar.com/ns/dl/v1"
  ],
  "type": [ "VerifiableCredential", "DrivingLicense", "BarLicense" ]
  "credentialSubject": {
    "canDrive": true,
    "bar": true
   }
}

When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar. Verifiers in the realm of Foo are primarily interested in the base properties of the DrivingLicense and occasionally in the specific properties of the FooLicense. The same situation applies to the realm of Bar, but with a focus on their respective properties.

Adopting the ACCEPTED_CONTEXTS approach would require Foo, Bar, and all other realms to continually distribute and update their individual context definitions. This approach just does not scale very well and it sacrifices the open-world data model since all @contex URLs and/or definitions have to be statically configured.

veikkoeeva commented 1 week ago

we are speaking about this pseudo-code
  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }
It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the ACCEPTED_CONTEXTS array. Consider the following simplified VC examples: [...]

Great examples! Thanks!

Some context on why I think why a test "should not happen" plus a less mentioned issue of having good examples.

Related to https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184999640: I'm not completely alien to this sort of work and indeed, when I implemented the "first pass sketch" of the code, I bit struggled with implications of this sort since I'm not so familiar with JSON-LD. So, I thought to "get back to with better time" and just not release anything before things are clearer (plus the library change not being public, though there's something already in the tests about this).

Some part of that was if I have a document like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6, how to pick apart the pieces for canonicalization, turning into bytes, hashing, signing and so on. For this "sketch" I was quite happy to have the same results with the keys as the example document, but I know I paid only passing thought for these sort of things. Partially because there's been related discussion earlier.

I mention this about this example piece, since I think since good examples are perhaps more important than has been implied here. I naturally also think that a test of what not should happen are important -- and maybe add some notes of the sort to an example or two too. They're already something I've (we) been codifying to some tests. It's also a great way to document things.

filip26 commented 1 week ago

@awoie a verifier should not guess what's inside a context nor to try to anticipate if there is some agreement between context providers.

When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar.

If a verfier recognizes both https://foo.com/ns/dl/v1 and https://bar.com/ns/dl/v1 then there is no issue. It simply means that a verifier accepts both DMV departments' vocabulary, no matter that there are shared parts. A situation in which a verifier accepts something just because it uses well known terms is a risk, not otherwise.

An ability to understand well know terms, e.g. defined by schema.org is a great feature but not in VCs eco-system where we don't want to guess but be sure.

This approach just does not scale very well and it sacrifices the open-world data model since all @context

It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context.

If there is a need, a good reason, to share parts between some parties, then the easiest, transparent, and scalable solution is this:

 "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://dmv-vocab/ns/dl/v1"
    "https://foo.com/ns/dl/v1"
  ],

 "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://dmv-vocab/ns/dl/v1"
    "https://bar.com/ns/dl/v1"
  ],

awoie commented 1 week ago

@filip26 wrote:

If a verfier recognizes both https://foo.com/ns/dl/v1 and https://bar.com/ns/dl/v1 then there is no issue. It simply means that a verifier accepts both DMV departments' vocabulary, no matter that there are shared parts. A situation in which a verifier accepts something just because it uses well known terms is a risk, not otherwise.

I didn't say it is not a solution. My point was that it is a solution which does not scale. A verifier from Foo might have never seen a @context from Bar but it shouldn't matter because they agreed on a common vocab defined by https://www.w3id.org/dl/v1. Forcing all verifiers or issuers from different realms to continuously reach out to each other to keep @context URLs and definitions up-to-date and well-known does not scale for a lot of use cases.

@filip26 wrote:

It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context.

No, it doesn't because the assumption of the ACCEPTED_CONTEXTS is to statically configure them. The web is not static and not all actors monitor each other continuously.

filip26 commented 1 week ago

@awoie

No, it doesn't because the assumption of the ACCEPTED_CONTEXTS is to statically configure them. The web is not static and not all actors monitor each other continuously.

It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know - that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issuers, verifiers, etc.

awoie commented 1 week ago

@filip26 wrote:

It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know -

My point on scalability refers to an increase in operational costs, not necessarily performance. Performance might be another point but I cannot comment on that.

@filip26 wrote:

that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issues, verifiers, etc.

If this is what we want, this sacrifices the open-world data model the VCDM promise as mentioned here.

filip26 commented 1 week ago

@awoie I'm sorry, I don't think we are on the same page and I'll let others to explain that it does not affect scalability of VCs eco-system nor open-world data model.

PatStLouis commented 1 week ago

@awoie I like you extensibility example a lot since its similar to the context in which I'm evaluating the impact of this.

My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system? The base information contained in the dl context is by design sufficient for verification needs by third parties.

Verifiers will know what information they want to verify, they are not blindly verifying abstract data.

As for the classification of this disclosure, while I can't really argue with your labeling, this is not a formal classification.

If we take 2 examples of vulnerability disclosed around XML Signature Wrapping Attacks:

CVE-2021-28091
CVE-2023-34923

Both of these affect a specific software and lead to 2 distinct CWE:

Improper Verification of Cryptographic Signature
Incorrect Authorization

They are not addressed by a change to XML, but a security mitigation in the affected software. This is an important distinction to make and loops back to a Security Misconfiguration.

It's hard for me to understand what exactly this disclosure tries to underline as the vulnerability

Json-ld?
Verifiable Credentials?
Data integrity?
A specific cryptosuite?
A specific implementation of a cryptosuite?

In seems the target of the vulnerability is being shifted around depending on the questions asked/comments made.

awoie commented 1 week ago

@awoie I like you extensibility example a lot since its similar to the context in which I'm evaluating the impact of this.

My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system? The base information contained in the dl context is by design sufficient for verification needs by third parties.

@PatStLouis Yes, evaluating the base properties is sufficient for verification needs but the vulnerability explained in the disclosure allows attackers to:

In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated.

This might affect the base properties as well. A verifier from Foo cannot distinguish between modifications made by an attacker or @context values provided by legitimate actors from a different realm, e.g., Bar. The examples I provided did not show the attack, they just aimed to demonstrate that ACCEPTED_CONTEXTS does not scale because in certain cases participants of different realms cannot or do not want to interact with each other continuously.

peacekeeper commented 1 week ago

From my point of view, this is essentially the same observation that I already described last year in JSON-LD VCs are NOT “just JSON”.

When I explored this same topic back then, I didn't really consider this a "vulnerability", but rather that it's important to understand what data is being secured. With Data Integrity, what is being secured is not just the document itself, but also the semantics behind it.

You could also make the exact opposite argument and call the SD-JWT (VC (DM)) / JWS family of technologies vulnerable, since they secure only the document, not the semantics behind it, e.g. see my results here.

What confuses me most about this thread is that this is somehow being communicated as a "new disclosure". I don't think there is anything new here, many of us (and I believe that includes Mattr as well) have known exactly how JSON-LD and Contexts and Data Integrity work for years.

tplooker commented 1 week ago

What confuses me most about this thread is that this is somehow being communicated as a "new disclosure". I don't think there is anything new here, many of us (and I believe that includes Mattr as well) have known exactly how JSON-LD and Contexts and Data Integrity work for years.

Understanding how a technology works vs how it could be exploited are two entirely different things. If they were the same thing we wouldn't have software vulnerabilities in the first place because it would be designed and built perfectly. Unless you mean to imply these vulnerabilities were known about by some and are somehow in fact a feature?

You could also make the exact opposite argument and call the SD-JWT (VC (DM)) / JWS family of technologies vulnerable, since they secure only the document, not the semantics behind it, e.g. see my results here.

Please go through the appropriate channels of disclosure and gather your evidence, like this issue has if you believe there is a vulnerability. As a response however I don't see there being any substance to this argument, SD-JWT, JWS and mDoc secure the bytes which contain the semantics, this is a far more robust design and is entirely immune from the vulnerabilities disclosed here which exploit the fact that data integrity requires a complex set of transformations during the proof generation and verification process.

tplooker commented 1 week ago

As a way of trying to progress this issue and focus on solutions, my feedback on the proposed mitigation strategies from @msporny are as follows.

The mitigations proposed fail to address the root causes of this vulnerability which are:

1) The JSON-LD @protected keyword is broken with regard to protecting the definition of terms as it doesn't work with those defined using @base and @vocab. 2) Data integrity documents fail to integrity protected the most critical portion of the document which is the @context entry, it is crucial to protect because it controls critical behaviour in the proof creation and verification procedures.

Counter Proposal

1a) - JSON-LD is fixed with regard to how @protected works with regard to @base and @vocab OR 1b) - All usages of @protected are stripped from core contexts and mentions of it are stripped from Data integrity with a warning label attached to any implementers who happen to use the feature saying "be aware this feature of JSON-LD doesn't work consistently across term definitions" as a developer would reasonably expect.

AND

2 - Data integrity changes the proof generation and verification procedures to include a hash of the @context entries in the document ensuring no manipulation of the @context entry can be done without detection.

dlongley commented 1 week ago

Verifiable credentials are designed for the three-party model such that documents self-describe their context using @context. The fact that this is explicit may actual help bring to light that one just has to assume the context of a document when it isn't self-describing. This might be acceptable in a closed-world, small scale (or large-corporate-mediated) two-party model, but is dangerous in the decentralized three-party model. In a closed world, two-party model, this lack of explicit contextual information is often approximated implicitly based on checking who the authentic author of a document is. But this approach can lead to assumptions about that author's intent that are more likely to be inaccurate in an open world, long-lived, and large scale ecosystem with rich credentials -- a system where the holders and verifiers are fully independent actors who may use verifiable credentials according to whatever business rules they desire.

Being able to express the same information in different ways without losing protection over it is valuable, especially at global scale and across different mediums and decentralized channels of communication. Being able to express multiple forms of protection on the same information, each with different features, is valuable for the decentralized three-party model as different parties that do not know each other will accept different mechanisms and have different maintenance and upgrade schedules.

If a document is tagged with a context you don't understand and you are unwilling to translate it to one that you do, simply reject the document. That's ok and expected. That is better than presuming it is in a context you do understand when it is clearly marked otherwise -- no matter who the author is. This can certainly be better in a number of ways than other formats where the document isn't self-describing in a globally unambiguous way. In those cases you must presume it is in a context you understand and you have no explicit means by which to differentiate it when there are conflicts. Not having this information is especially troublesome in the three-party model where communication and decisions can be very decentralized. This includes documents that might be from the same issuer and include some of the same types of information, but were generated based on very different assumptions and processes over time.

Further, presuming that your software is safe from attack just because you know a payload hasn't been modified during transit is dangerous and wrong. Software should be written such that it is sufficiently protected from exploit during the digestion of any payload prior to its use, no matter the author. The idea that payload digestion can be allowed to cause damage so long as it has a certain author is bad engineering. No author is perfect and, under the three-party model, not every author is known by reputation.

The threat model is different in an open world for software that does not know authors by reputation and / or that is expected to be able to retrieve documents from the Web. This is ok and powerful. But the "padlock icon" in your browser does not mean that the website you're using is "safe". Browsers must be safe from exploit no matter who is sending the content; they do not know all the Web origins you can visit by reputation. Digital wallets and some verifiers are in a similar situation in the three-party model. Almost without exclusion, the situation is different in a closed-world, two-party model. We should not expect the same designs and trade offs under different models.

MATTR's misuse of data integrity does not warrant any changes to its core design. It is a fact that @context must be understood by the consumer and MATTR's approach has been to ignore that field and instead try to consume documents without understanding their meaning (which depends on @context). MATTR's implication that once an the authenticity check has been performed that this approach is ok is also false; context matters. MATTR's implication that authors are always known by reputation is also false in the three-party model and is therefore not a consistently reliable short-circuiting tool; there are even drawbacks to forcing everyone to go back to an author to get a new expression of the same information.

Instead, JSON keys should be treated as opaque when @context is not understood; one cannot guess that just because a JSON key contains the string "firstName" that it means that the associated value will be a person's first name (setting aside that not every person even has a first name and that the meaning of "first name" varies across cultures and so on).

Misusing a technology in a way that the design specifically says you cannot do, calling it a vulnerability, and then demanding that it change its design is not a reasonable request.

In order to make progress here, it must first be understood that @context must be understood by a consumer before consuming the document. This requires the @context value actually be checked for an expected value or the use of the JSON-LD compaction API to transform a document to another that uses a @context that is understood, prior to consumption. Doing at least one of these is a requirement before consumption -- and it does not harm scalability nor open world use, but rather enhances them. Note that a consequence of this is that knowing that an author used a particular context (via cryptographic proof) does little when that context must either already be understood by the consumer (e.g., they have a static copy) or the consumer must transform the document to another context that they do understand anyway. If a change in thinking to match the design of the technology can be made, then we can see where the conversation goes from there.

On its own, I do continue think that the removal of @vocab from the core context would be a good idea for a number of reasons, but also because it tends to lead some who are used to the two-party model and closed-world JSON modeling to think that JSON keys on their own can be trusted to be conflict free and meaningful -- when this is not true for verifiable credentials. Context must still be checked and understood. Note this does not mean that people could not use @vocab via another context, but having it come as a prepackaged default has unnecessary adverse effects. It should be opt-in.

msporny commented 1 week ago

@peacekeeper wrote:

What confuses me most about this thread is that this is somehow being communicated as a "new disclosure". I don't think there is anything new here

Agreed, and it seems that there are a number of other implementers in this thread that do not believe there is anything new here either. @peacekeeper covered much of this in his blog post and his presentations on the topic a while ago.

Additionally, security guidance (that one needs to check what's in the @context value) has been in the specification for some time as well that would mitigate the concern raised in the original issue. Namely in this section https://www.w3.org/TR/2024/CRD-vc-data-model-2.0-20240618/#base-context , which I'll quote below:

Implementations MUST treat the base context value, located at https://www.w3.org/ns/credentials/v2, as already retrieved; the following value is the hexadecimal encoded SHA2-256 digest value of the base context file: ... It is strongly advised that all JSON-LD Context URLs used by an application utilize the same mechanism, or a functionally equivalent mechanism, to ensure end-to-end security. This section serves as a reminder of the importance of ensuring that, when verifying verifiable credentials and verifiable presentations, the verifier has information that is consistent with what the issuer or holder had when securing the credential or presentation. This information might include at least: The content in a credential whose meaning depends on a link to an external URL, such as a JSON-LD Context, which can be secured by using a local static copy or a cryptographic digest of the file. See Section 5.3 Integrity of Related Resources for more details. It is considered a best practice to ensure that the same sorts of protections are provided for any URL that is critical to the security of the verifiable credential through the use of permanently cached files and/or cryptographic hashes. Ultimately, knowing the cryptographic digest of any linked external content enables a verifier to confirm that the content is the same as what the issuer or holder intended.

We provide a lot of guidance in the VCDM specification today, as well as the Data Integrity specification (which I won't quote because it's largely duplicative of the information above), that if followed, prevents the compromises described in this issue.

Now, we might want to get more forceful/normative with the language that exists in the specification today, or clearly articulate when using @vocab is a good idea, and when it might not be a good idea. I do think this issue, and the subsequent work at IETF, constitutes as new information that has made the use of @vocab in the base context far less compelling.

We do need to analyze each proposed mitigation (which we haven't exhaustively done in this issue yet). Thank you, @tplooker, for continuing to provide concrete proposals that the group can evaluate. We'll have to determine consensus on each concrete proposal and see where the group lands.

tplooker commented 1 week ago

MATTR's misuse of data integrity does not warrant any changes to its core design. It is a fact that @context must be understood by the consumer and MATTR's approach has been to ignore that field and instead try to consume documents without understanding their meaning (which depends on @context). MATTR's implication that once an the authenticity check has been performed that this approach is ok is also false; context matters. MATTR's implication that authors are always known by reputation is also false in the three-party model and is therefore not a consistently reliable short-circuiting tool; there are even drawbacks to forcing everyone to go back to an author to get a new expression of the same information

The example scripts given at the start of this issue were about providing readers of this issue with an easy way to reproduce the issue independently and it appears in this thread focus has been on how that software was configured rather then the vulnerability itself. I encourage people to please seperate these two things.

To make it clear that this issue isn't a software configuration issue, I would just like to point out that I have replicated this issue in the following pieces of software

https://vcplayground.org/ https://demo.vereswallet.dev/

To be clear there was no "re-configuration" of software required, I simply used the software as deployed, it required a small amount of JS to swap the attributes in the credential, which could have occured any where from when the credential was issued such as a MITM attack. To also provide an indication of attacker effort, it took me ~30mins to reproduce this vulnerability showing that it is quite easy to perform.

VC playground might be "playground" software and the deployment of veres wallet might be a "demo", but the fact they are both open to this vulnerability proves there is real software deployed today with this vulnerability. I believe there are many other deployments exhibit the same vulnerability including all the VC API backends that are connected to the VC playground.

To draw another analogy too as it keeps coming up in this thread, the equivalent "playground" style tool in JWT land https://jwt.io allows you to paste in JWT's and validate it, this tool doesn't allow one to maliciously manipulate the playload of a JWT and the signature still be reported as valid and that is because the design of JWT's simply doesn't allow the same attack.

So please can we move past the notion that "MATTR misconfigured software" when that clearly is not the case and work on how we are going to fix this problem.

dlongley commented 1 week ago

@tplooker,

VC playground might be "playground" software and the deployment of veres wallet might be a "demo", but the fact they are both open to this vulnerability proves there is real software deployed today with this vulnerability.

Putting every piece of software that is publicly available into the same category by using the term "real" doesn't add any weight to the argument. The VC playground is a sandbox and middleware tool -- and the problem being discussed needs to be mitigated at the application level (the consumer of the document) as discussed. These tools are happy to work with any contexts you throw at them because they are not actually consuming the information. This is the responsibility of the application. Again, the design approaches are different and there are valuable trade offs in both directions.

To draw another analogy too as it keeps coming up in this thread, the equivalent "playground" style tool in JWT land https://jwt.io/ allows you to paste in JWT's and validate it, this tool doesn't allow one to maliciously manipulate the playload of a JWT and the signature still be reported as valid.

Not exactly, but this is a bit of a distraction I think. But for something that someone might find unexpected -- and in a way that I personally do not think matters, take the example JWT that comes up when you load this site: https://jwt.io:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

And replace the last character with a d instead:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5d

And it will still validate. There are other changes that will validate as well. Of course, this is due to the vagaries of base64url-encoding and I doubt it's easily exploitable, but maybe it causes a problem for someone who does some JWT comparisons somewhere.

David-Chadwick commented 1 week ago

I think the core of the problem is the global context mapping from URIs to human friendly JSON property names. Why does JSON-LD do this? A bit of history. First we had X.500 with globally unique attribute types (aka property names), called object identifiers (OIDs). This works well in a global open world model. No two attributes/properties can be confused because of their globally unique OIDs. OIDs were easy for computers to use, but horrible for users and programmers e.g. what is 2.5.6.3.25? So LDAP introduced strings to name attribute types e.g. givenName, telephoneNumber etc. But these are not globally unique, nor user friendly for non-English speaking people. So we ended up with different LDAP systems using the same attribute name for attributes with different semantics and the same semantic attribute having different attribute names in different systems. I think JSON LD used the context mapping to try to solve this problem (Manu correct me if I am wrong please.) So JSON-LD introduced the global context mapping from URIs (equivalent to OIDs) to JSON property name (equivalent to LDAP attribute names) and the VC DM uses the latter to specify VCs. But this is not user friendly for non-English speaking users and programmers. In my opinion one solution is for the VC DM to revert to using global URIs instead of short form JSON property names when specifying VCs (much as X.500 did). Computers, including the security software, can use the URIs everywhere where computations and comparisons are done, with no ambiguity being introduced. The mapping of URIs to a JSON property names should be done locally and not globally, so that the JSON property names can appear in the user interfaces, and every end system should have one or more of these context mappings to cater for the global community. But the underlying VC would be constructed with URIs, and URIs would be used everywhere by the computing machinery. This will remove the current security vulnerabilities, because there will never be two different looking VCs with the same semantic content (as we have in the examples presented to us today). OTOH there will be thousands of different displays of the same VC as each end system will display it using its own local context mappings for the URIs to user friendly JSON property display names. Verifiers will be able to verify any integrity protected VC from anywhere, since the signature will be computed on the URIs. When displaying the VC to the user, the end user system (for the issuer, holder or verifier) will either already have a local mapping for the URIs and will use them in the display, or wont, and will display the URI. The URI for the JSON property of first name can now be displayed as First Name to an English user, as prenom to a French user, and nombre de pila to a Spanish user, but the VC itself will contain the URI for the JSON property name. The downside is that this will increase the size of VCs as property names could be up to 10 times bigger. But the upside is that it will remove the current vulnerabilities, and if each JSON property is a URL rather than a URI, then the URL can contain the description of the property, and some mappings into language specific user friendly display forms that the end systems can use. Specifying a format for specifying VC properties will make the URLs machine processable. Now if we want to make this system more space efficient, we can have one URL that contains all mappings from URI properties to short form JSON property names, and use the short form JSON property names in the VCs, but this URL must be integrity protected along with the contents contained at the URL. And I think we then end up with the scheme proposed by @tplooker above.

Data integrity changes the proof generation and verification procedures to include a hash of the @context entries in the document ensuring no manipulation of the @context entry can be done without detection.

dlongley commented 1 week ago

@David-Chadwick,

So JSON-LD introduced the global context mapping from URIs (equivalent to OIDs) to JSON property name (equivalent to LDAP attribute names) and the VC DM uses the latter to specify VCs. But this is not user friendly for non-English speaking users and programmers.

Well, in light of what you're saying about language and property names and as a very quick example of the power of being able to change contexts without losing protection, see the following.

Here's a VC I issued quickly that uses firstName as an English JSON key:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    {
      "firstName": "https://example.com#firstName"
    },
    "https://w3id.org/security/data-integrity/v2"
  ],
  "type": [
    "VerifiableCredential"
  ],
  "credentialSubject": {
    "id": "did:example:b34AA2I0ZdwAACBDu",
   // ENGLISH
    "firstName": "Jane"
  },
  "issuer": "did:key:zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc",
  "issuanceDate": "2024-06-25T22:05:37.910Z",
  "proof": [
    {
      "type": "DataIntegrityProof",
      "created": "2024-06-25T22:05:37Z",
      "verificationMethod": "did:key:zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc#zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc",
      "cryptosuite": "ecdsa-rdfc-2019",
      "proofPurpose": "assertionMethod",
      "proofValue": "z4LcYkqjqNv471T6xXs4UfqBC4iqZfCkix9mGwEguKydqtc6fyKJXZuGLFWkm5Vcjh2Hm4eXL66zA1kc1f1crY9qs"
    },
    {
      "id": "urn:uuid:f0c6e2c7-6b15-4eac-a92e-0af5417d5364",
      "type": "DataIntegrityProof",
      "created": "2024-06-25T22:05:38Z",
      "verificationMethod": "did:key:zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc#zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc",
      "cryptosuite": "ecdsa-sd-2023",
      "proofPurpose": "assertionMethod",
      "proofValue": "u2V0AhVhATjx_w6ZhQLpSQry4_Y2PgHxArBQvjnuAAcWiL48Z2xd2EizRnDk0wMDk8-5Vs99BRVtx5Wy-vp28DaQse1QQOlgjgCQDCa0p736HXVYtw8sC-J0KrBluQMVT44XmXJtuIbM0C2RYIKcVQB8U-3h6eSIeS7z4nWITTT4sufCMFGfPqqjRfkKnglhAeIriKm0-fu3LzeF9JC4f-TkOPITXPCobhnAiU6FmitN1sbJGZu9GKnsit-eKVPvERTfb4xBChzD4-O9CFBqzJlhA49AIQrdq7vIxsvM2J9FotMZxp_931MsYzjNTsOqjPAEOYF8e8T48RWwik293XM6sBPP5YoNDLLsPaUb45ETnAYJnL2lzc3Vlcm0vaXNzdWFuY2VEYXRl"
    }
  ]
}

And here's the same VC but expressed using German vorname instead, without losing the protection, courtesy of running the VC through JSON-LD compaction via the JSON-LD playground:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    {
      "vorname": "https://example.com#firstName"
    },
    "https://w3id.org/security/data-integrity/v2"
  ],
  "type": "VerifiableCredential",
  "proof": [
    {
      "type": "DataIntegrityProof",
      "created": "2024-06-25T22:05:37Z",
      "cryptosuite": "ecdsa-rdfc-2019",
      "proofPurpose": "assertionMethod",
      "proofValue": "z4LcYkqjqNv471T6xXs4UfqBC4iqZfCkix9mGwEguKydqtc6fyKJXZuGLFWkm5Vcjh2Hm4eXL66zA1kc1f1crY9qs",
      "verificationMethod": "did:key:zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc#zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc"
    },
    {
      "id": "urn:uuid:f0c6e2c7-6b15-4eac-a92e-0af5417d5364",
      "type": "DataIntegrityProof",
      "created": "2024-06-25T22:05:38Z",
      "cryptosuite": "ecdsa-sd-2023",
      "proofPurpose": "assertionMethod",
      "proofValue": "u2V0AhVhATjx_w6ZhQLpSQry4_Y2PgHxArBQvjnuAAcWiL48Z2xd2EizRnDk0wMDk8-5Vs99BRVtx5Wy-vp28DaQse1QQOlgjgCQDCa0p736HXVYtw8sC-J0KrBluQMVT44XmXJtuIbM0C2RYIKcVQB8U-3h6eSIeS7z4nWITTT4sufCMFGfPqqjRfkKnglhAeIriKm0-fu3LzeF9JC4f-TkOPITXPCobhnAiU6FmitN1sbJGZu9GKnsit-eKVPvERTfb4xBChzD4-O9CFBqzJlhA49AIQrdq7vIxsvM2J9FotMZxp_931MsYzjNTsOqjPAEOYF8e8T48RWwik293XM6sBPP5YoNDLLsPaUb45ETnAYJnL2lzc3Vlcm0vaXNzdWFuY2VEYXRl",
      "verificationMethod": "did:key:zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc#zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc"
    }
  ],
  "credentialSubject": {
    "id": "did:example:b34AA2I0ZdwAACBDu",
    // GERMAN
    "vorname": "Jane"
  },
  "issuanceDate": "2024-06-25T22:05:37.910Z",
  "issuer": "did:key:zDnaeeojYnQfLnUNtfXEYykkQqe4v4tjTmczQENrUzXTPpGPc"
}

You don't have to go back to the issuer to express this VC in another language -- you can translate it and pass it along to someone who prefers it in German (or they can do the same before they pass it along to their own German-based software). The protection will still work. What is required is that the application (not middleware / helper tools) make sure it understands the context that is in use.

filip26 commented 1 week ago

just a note add @context to a proof.

This would prevent a context translation (language translation is just one of possible use-cases), and would make the creation of derived VC/VP much more complicated if not impossible.

There are a lot of scenarios when an ability to easily change @context is required, especially in a long-lived environment - it enhances scalability as has been mentioned above.

e.g. There might be valid VC issued with a context that has been obsoleted, or is not supported by a verifier, perhaps in the past the issuer "invented" its own context but now there is commonly accepted one, and even the origin issuer has switched to the new common context already. Or simply, as holder I want to derive VC/VP and a verifier does not understand the hardwired context - note it scales up/down - you might downgrade to "old context" when you face to some old verifier in a shop where they have not upgraded the software for years , etc.

msporny commented 1 week ago

@tplooker wrote:

The example scripts given at the start of this issue were about providing readers of this issue with an easy way to reproduce the issue independently and it appears in this thread focus has been on how that software was configured rather then the vulnerability itself.

The example scripts at the start of the issue created a vulnerability in the verifier by not following the provided guidance in the specification. Had the scripts followed the guidance, the verifier would have rejected the input. This was demonstrated in a section called "Demonstration of Proper Implementation" here:

https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184084631

peacekeeper commented 1 week ago

You could also make the exact opposite argument and call the SD-JWT (VC (DM)) / JWS family of technologies vulnerable, since they secure only the document, not the semantics behind it, e.g. see my results here.

Please go through the appropriate channels of disclosure and gather your evidence, like this issue has if you believe there is a vulnerability. As a response however I don't see there being any substance to this argument, SD-JWT, JWS and mDoc secure the bytes which contain the semantics,

No.. Let's say in this example here: https://drafts.oauth.net/oauth-selective-disclosure-jwt/draft-ietf-oauth-selective-disclosure-jwt.html#appendix-A.4, the bytes that are secured by SD-JWT / JWS definitely do NOT contain the semantics (those are defined in the context, which is ignored by SD-JWT / JWS). If the semantics (the context) are manipulated by an attacker (and you argue that this is easy), then SD-JWT / JWS would not detect that, whereas Data Integrity would.

If you think Data Integrity is "vulnerable" because changed documents can have the same signature, then SD-JWT (VC (DM)) / JWS is "vulnerable" too because the same document with changed semantics can have the same signature. I don't really have to gather evidence for this, since I have already shown experiments in both "directions" on this topic last year, rather than just picking one of the two sides as this issue has.

If I remember correctly, I think at EIC we agreed that the traditional narrative that JSON-LD VCs can be "processed as plain JSON" can be problematic. From my perspective, what matter is to understand what is being secured in each case (the document, or the document plus its semantics defined by the context).

msporny commented 1 week ago

@tplooker wrote:

To draw another analogy too as it keeps coming up in this thread, the equivalent "playground" style tool in JWT land https://jwt.io/ allows you to paste in JWT's and validate it, this tool doesn't allow one to maliciously manipulate the playload of a JWT and the signature still be reported as valid and that is because the design of JWT's simply doesn't allow the same attack.

Using that line of reasoning, it's worse with JWT... if I use jwt.io to create a signed JWT, like this one (that envelopes a VC):

eyJraWQiOiJFeEhrQk1XOWZtYmt2VjI2Nm1ScHVQMnNVWV9OX0VXSU4xbGFwVXpPOHJvIiwiYWxnIjoiRVMyNTYifQ.eyJAY29udGV4dCI6WyJodHRwczovL3d3dy53My5vcmcvbnMvY3JlZGVudGlhbHMvdjIiLCJodHRwczovL3d3dy53My5vcmcvbnMvY3JlZGVudGlhbHMvZXhhbXBsZXMvdjIiXSwiaWQiOiJodHRwOi8vdW5pdmVyc2l0eS5leGFtcGxlL2NyZWRlbnRpYWxzLzE4NzIiLCJ0eXBlIjpbIlZlcmlmaWFibGVDcmVkZW50aWFsIiwiRXhhbXBsZUFsdW1uaUNyZWRlbnRpYWwiXSwiaXNzdWVyIjoiaHR0cHM6Ly91bml2ZXJzaXR5LmV4YW1wbGUvaXNzdWVycy81NjUwNDkiLCJ2YWxpZEZyb20iOiIyMDEwLTAxLTAxVDE5OjIzOjI0WiIsImNyZWRlbnRpYWxTY2hlbWEiOnsiaWQiOiJodHRwczovL2V4YW1wbGUub3JnL2V4YW1wbGVzL2RlZ3JlZS5qc29uIiwidHlwZSI6Ikpzb25TY2hlbWEifSwiY3JlZGVudGlhbFN1YmplY3QiOnsiaWQiOiJkaWQ6ZXhhbXBsZToxMjMiLCJkZWdyZWUiOnsidHlwZSI6IkJhY2hlbG9yRGVncmVlIiwibmFtZSI6IkJhY2hlbG9yIG9mIFNjaWVuY2UgYW5kIEFydHMifX19.sQGXv_RmWvfHsSiGdQDJPQQ3r9w1wtVr0tYC-LLtW4mt3SM6s79WS3zg7rJLcP9MoBVNrROeHJZBmybZzUYybg

with this public key:

-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEEVs/o5+uQbTjL3chynL4wXgUg2R9
q9UU8I5mEovUf86QZ7kOBIjJwqnzD1omageEHWwHdBO6B+dFabmdT9POxg==
-----END PUBLIC KEY-----

and this private key:

and use a text editor to modify it to this JWT, injecting some of my own data:

ew0KICAia2lkIjogIkV4SGtCTVc5Zm1ia3ZWMjY2bVJwdVAyc1VZX05fRVdJTjFsYXBVek84cm8iLA0KICAiYWxnIjogIm5vbmUiDQp9.ew0KICAiQGNvbnRleHQiOiBbDQogICAgImh0dHBzOi8vd3d3LnczLm9yZy9ucy9jcmVkZW50aWFscy92MiIsDQogICAgImh0dHBzOi8vd3d3LnczLm9yZy9ucy9jcmVkZW50aWFscy9leGFtcGxlcy92MiINCiAgXSwNCiAgImlkIjogImh0dHA6Ly91bml2ZXJzaXR5LmV4YW1wbGUvY3JlZGVudGlhbHMvMTg3MiIsDQogICJ0eXBlIjogWw0KICAgICJWZXJpZmlhYmxlQ3JlZGVudGlhbCIsDQogICAgIkV2aWxBbHVtbmlDcmVkZW50aWFsIg0KICBdLA0KICAiaXNzdWVyIjogImh0dHBzOi8vZXZpbC51bml2ZXJzaXR5LmV4YW1wbGUvaXNzdWVycy81NjUwNDkiLA0KICAidmFsaWRGcm9tIjogIjIwMTAtMDEtMDFUMTk6MjM6MjRaIiwNCiAgImNyZWRlbnRpYWxTY2hlbWEiOiB7DQogICAgImlkIjogImh0dHBzOi8vZXhhbXBsZS5vcmcvZXhhbXBsZXMvZGVncmVlLmpzb24iLA0KICAgICJ0eXBlIjogIkpzb25TY2hlbWEiDQogIH0sDQogICJjcmVkZW50aWFsU3ViamVjdCI6IHsNCiAgICAiaWQiOiAiZGlkOmV4YW1wbGU6MTIzIiwNCiAgICAiZGVncmVlIjogew0KICAgICAgInR5cGUiOiAiQmFjaGVsb3JEZWdyZWUiLA0KICAgICAgIm5hbWUiOiAiQmFjaGVsb3Igb2YgU2NpZW5jZSBhbmQgQXJ0cyINCiAgICB9DQogIH0NCn0.

and then use one of the most popular JWT libraries (12M downloads per week) to verify it using this code:

const jwt = require('jsonwebtoken');

const vcjwt = 'ew0KICAia2lkIjogIkV4SGtCTVc5Zm1ia3ZWMjY2bVJwdVAyc1VZX05fRVdJTjFsYXBVek84cm8iLA0KICAiYWxnIjogIm5vbmUiDQp9.ew0KICAiQGNvbnRleHQiOiBbDQogICAgImh0dHBzOi8vd3d3LnczLm9yZy9ucy9jcmVkZW50aWFscy92MiIsDQogICAgImh0dHBzOi8vd3d3LnczLm9yZy9ucy9jcmVkZW50aWFscy9leGFtcGxlcy92MiINCiAgXSwNCiAgImlkIjogImh0dHA6Ly91bml2ZXJzaXR5LmV4YW1wbGUvY3JlZGVudGlhbHMvMTg3MiIsDQogICJ0eXBlIjogWw0KICAgICJWZXJpZmlhYmxlQ3JlZGVudGlhbCIsDQogICAgIkV2aWxBbHVtbmlDcmVkZW50aWFsIg0KICBdLA0KICAiaXNzdWVyIjogImh0dHBzOi8vZXZpbC51bml2ZXJzaXR5LmV4YW1wbGUvaXNzdWVycy81NjUwNDkiLA0KICAidmFsaWRGcm9tIjogIjIwMTAtMDEtMDFUMTk6MjM6MjRaIiwNCiAgImNyZWRlbnRpYWxTY2hlbWEiOiB7DQogICAgImlkIjogImh0dHBzOi8vZXhhbXBsZS5vcmcvZXhhbXBsZXMvZGVncmVlLmpzb24iLA0KICAgICJ0eXBlIjogIkpzb25TY2hlbWEiDQogIH0sDQogICJjcmVkZW50aWFsU3ViamVjdCI6IHsNCiAgICAiaWQiOiAiZGlkOmV4YW1wbGU6MTIzIiwNCiAgICAiZGVncmVlIjogew0KICAgICAgInR5cGUiOiAiQmFjaGVsb3JEZWdyZWUiLA0KICAgICAgIm5hbWUiOiAiQmFjaGVsb3Igb2YgU2NpZW5jZSBhbmQgQXJ0cyINCiAgICB9DQogIH0NCn0.';
const algorithms = ['RS256', 'RS384', 'RS512', 'ES256', 'ES384', 'ES512', 'RS256', 'RS384', 'RS512', 'HS256', 'HS384', 'HS512', 'none'];
// retrieve public key by kid
const getKey = (header, callback) => {
  let publicKey = null;
  if(header.kid === 'ExHkBMW9fmbkvV266mRpuP2sUY_N_EWINllapUzO8ro') {
    publicKey = `-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEEVs/o5+uQbTjL3chynL4wXgUg2R9
q9UU8I5mEovUf86QZ7kOBIjJwqnzD1omageEHWwHdBO6B+dFabmdT9POxg==
-----END PUBLIC KEY-----`;
  }
  callback(null, publicKey);
};

jwt.verify(vcjwt, getKey, {algorithms}, function(err, vc) {
  console.log('Verification successful!', vc);
});

and I get the following result:

Verification successful! {
  '@context': [
    'https://www.w3.org/ns/credentials/v2',
    'https://www.w3.org/ns/credentials/examples/v2'
  ],
  id: 'http://university.example/credentials/1872',
  type: [ 'VerifiableCredential', 'EvilAlumniCredential' ],
  issuer: 'https://evil.university.example/issuers/565049',
  validFrom: '2010-01-01T19:23:24Z',
  credentialSchema: {
    id: 'https://example.org/examples/degree.json',
    type: 'JsonSchema'
  },
  credentialSubject: {
    id: 'did:example:123',
    degree: { type: 'BachelorDegree', name: 'Bachelor of Science and Arts' }
  }
}

Note the EvilAlumniCredential and https://evil.university.example/issuers/565049 values. I have just, as you have, used "real" software to get a popular JWT library to report a successful verification after corrupting a payload on its way to the verifier.

I will not, however, claim that these are "significant security vulnerabilities in the design of JWT" because I:

Failed to do proper input validation.
Failed to use different code paths for dangerously different (HS/ES/none) use cases.
Failed to properly account for a kid lookup failure, and
Overrode the software defaults to enable alg=none (though I note that VC-JOSE-COSE /still/ allows it, which is a terrible aside).

One can misconfigure software to do terrible things. I do agree that we want to minimize footguns; that is why we have the guidance in the specification that we have today... and if a verifier follows that guidance, they don't have the same result that you demonstrated.

msporny commented 1 week ago

We really do need to start analyzing the proposals in more depth in this thread to see what is workable and what isn't. More specifically, we need to critically analyze at least these proposals (yes, I know people have touched on many of them as unworkable, but a full treatment for each proposal is what we need to establish whether or not we'll be able to get to consensus on any of the items):

We not publish Data Integrity as a technical recommendation because there are better solutions. (Orie)
JSON-LD is fixed with regard to how @protected works with regard to @base and @vocab (Tobias)
All usages of @protected are stripped from core contexts and mentions of it are stripped from Data integrity with a warning label attached to any implementers who happen to use the feature saying "be aware this feature of JSON-LD doesn't work consistently across term definitions" as a developer would reasonably expect. (Tobias)
Data integrity changes the proof generation and verification procedures to include a hash of the @context entries in the document ensuring no manipulation of the @context entry can be done without detection. (Tobias)
We replace all terms with full URLs in all VCs (DavidC)
We more strongly discourage the use of @vocab or @base, possibly banning its usage. (DaveL)
Add statements in the specification noting that it is not possible for the usage of @context to scale while simultaneously requiring a verifier to check all values in the @context. (Oliver/Tobias)
We strengthen the language around ensuring that the values in @context MUST be understood by verifiers during verification, potentially by modifying the verification algorithm to perform normative checks. (Manu)
Remove @vocab from the base context. (DaveL/Kim)
Document these 2 attack vectors somewhere so they can be included in a pen-testers auditing toolbox. (Patrick)
Improve test-suites to include asserting bad contexts in the conformance stage. (Patrick)

Are there other proposals that we need to put in front of the group for discussion?

kimdhamilton commented 1 week ago

This is possibly just a re-phrasing of a subset of 6, but I'd like to add 9 (in combination with other items, if necessary):

Remove @vocab from the base context

This is based on my experience implementing credential systems using JSON-LD and JSON-LD signatures. As others have stated, @vocab was great for development, but I wanted it nowhere near my production systems. I was surprised this was added in 2.0, but I understand this was added to help drive group consensus (so no judgment).

decentralgabe commented 1 week ago

4 seems sensible and straightforward. Do others feel differently?

tplooker commented 1 week ago

Using that line of reasoning, it's worse with JWT... if I use jwt.io to create a signed JWT, like this one (that envelopes a VC):

Again this misses the point, you are trying to make this about a software configuration issue, when it is a fundamental design flaw in DI, simply paste your modified JWT in JWT.io and note the following

Demonstrating it doesn't work in the deployed software.

The same cannot be said for how the DI vulnerabilities behave in the deployed software I cited above and I would be willing to bet there are many, if not majority of DI implementations, that are vulnerable in the same way.

One must ask if whitelisting contexts is such a simple and effective measure alone to mitigate this issue, why doesn't the software highlighted follow this recommendation? In my opinion it is because you loose the open world extensibility that the VC data model promises in the process and that is why it is an inadequate mitigation strategy and hence why I've suggested alternative solutions.

Are there other proposals that we need to put in front of the group for discussion?

Thank you for putting together this list, it looks good to me.

awoie commented 1 week ago

jsonwebtoken

SD-JWT does not allow alg=none. So, this will be detected by a validation algorithm that validates any VC secured using SD-JWTs because SD-JWT validation requires checking that and rejecting VCs that use alg=none.

An SD-JWT has a JWT component that MUST be signed using the Issuer's private key. It MUST NOT use the none algorithm.

awoie commented 1 week ago

jsonwebtoken

SD-JWT does not allow alg=none. So, this will be detected by a validation algorithm that validates any VC secured using SD-JWTs because SD-JWT validation requires checking that and rejecting VCs that use alg=none.

An SD-JWT has a JWT component that MUST be signed using the Issuer's private key. It MUST NOT use the none algorithm.

@msporny this means that alg=none validation is covered at the proof-specification-level which in that case is SD-JWT. In the same way, the vulnerability of Data Integrity that was demonstrated in this thread should be also covered at the spec-level.

aniltj commented 1 week ago

First and foremost, I wanted to thank both @tplooker for bringing this to the VCWG, and the Data Integrity editors for their analysis. Given the global interest in W3C VCDM, I am glad that this discussion is happening so that the right guidance can end up in the specifications going forward.

Some input into both the discussion and the proposed changes to make the specifications stronger:

From @peacekeeper:

... the traditional narrative that JSON-LD VCs can be "processed as plain JSON" can be problematic.

The "traditional narrative", as Markus notes, was grounded in a desire to have a "big tent". The ecosystem has moved on from when this narrative was articulated to the reality that post-VCDM 1.1, the data model is and remains JSON-LD compact form, which has been a global standard. So there is fully an expectation by anyone using VCDM 2.0, they need to understand that data model.

What that in particular means is that, if you are NOT using a JSON-LD aware mechanism to process a VCDM 2.0 payload (Data Integrity being a JSON-LD aware option), you have an obligation to build in the "processing logic" to check for the things that are expected when using JSON-LD compact form (similar to how you need to be aware of when checking the particulars of a CSV, JSON or XML).

I think this needs to be emphasized.

There are other options for those who do not wish to leverage JSON-LD (and its power and flexibility) but if you are using VCDM 2.0, you can't pretend it is not JSON-LD.

From @kimdhamilton:

Remove @vocab from the base context

I personally think that this earlier choice was a mistake, that makes many other mistakes possible.

At the same time, I fully see the value of @vocab when it comes to development and refinement of attribute bundles.

So, I would recommend that in addition to removing @vocab from the base context, @vocab is provided as an optional secondary context that developers can manually insert into the payload during development time and, as such, becomes explicitly visible when it is in use.

From @msporny:

we might want to get more forceful/normative with the language that exists in the specification today ...

Very much so. Particularly when it comes to defining all terms concretely for production use, and a MUST NOT (instead of a SHOULD NOT as it currently stands) for using @vocab in production use.

From @tplooker:

Data integrity changes the proof generation and verification procedures to include a hash of the @context entries in the document ensuring no manipulation of the @context entry can be done without detection.

This feels right, but I don't know enough about the down-stream impacts of this, so would like to learn more.