w3c / vc-data-integrity

W3C Data Integrity Specification
https://w3c.github.io/vc-data-integrity/
Other
40 stars 19 forks source link

Multiple significant security vulnerabilities in the design of data integrity #272

Open tplooker opened 2 months ago

tplooker commented 2 months ago

The following issue outlines two significant security vulnerabilities in data integrity.

For convenience in reviewing the below content here is a google slides version outlining the same information.

At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.

In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.

The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the @protected keyword doesn't cover terms that are defined using the @vocab and @base keywords. This means any terms defined using @vocab and @base are vulnerable to term redefinition.

The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the @context element of the JSON-LD document. The fact that the @context element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.

Please see the attached presentation for resolutions to this issue we have explored.

In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the @context element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.

I'm also happy to present this analysis to the WG if required.

(edited to put backticks around @words)

aniltj commented 1 month ago

+1 for render methods as a means to brand a credential. BC Gov has successfully demonstrated this by applying OCA bundles to anoncreds VC's as a branding overlay. We are now working on a OverlayCaptureBundle renderMethod in partnership with the Swiss government for json-ld VC's.

Nice! Agree on the use of renderMethod as a credential-focused, protocol-independent mechanism for this purpose. Hopefully your work will show-up as part of https://w3c-ccg.github.io/vc-render-method/

David-Chadwick commented 1 month ago

I question why we need to map from the property's URI to a compact string to a render string. Why do we need two mappings, especially when JSON-LD caters for languages?

msporny commented 1 month ago

I question why we need to map from the property's URI to a compact string to a render string. Why do we need two mappings, especially when JSON-LD caters for languages?

There continues to be confusion in this thread on the layers and what each layer is for. Let's take an example:

"title": "Manager"

If you don't know the context of the message, the expression above, on it's own, might as well be:

"p28dv4ja91": "Manager"

The mistake that keeps being made in some interpretations in this thread is that you can guess as to what the property means/is just by reading it as an English word (the security disclosure makes this same mistake). IOW, people are presuming the context just because it looks like an English word, but that's an insecure way to do processing in any information system. Two party systems establish context implicitly (depending on the two parties communicating). Three party systems establish context explicitly (because you can't scale past point-to-point, two-party integrations, without that happening).

Now, you can add a context like this:

"https://vocab.example/title": "Manager"

and you start getting to something that's less ambiguous, there's a global identifier that identifies the term, and if you use JSON-LD, you can get it into a developer-friendly form:

"@context": "https://vocab.example/v1",
"title": "Manager"

So, the developer can start plugging title into their code to retrieve the value, but that DOES NOT mean they can directly expose that text to a UX (without some extensive vetting of the context). Again, this is the mistake made throughout a number of comments in this thread... presuming that just because you know what the English word "title" means, that you know what the value expresses. Is it a Book Title? Or a Job Title? Or maybe it's a Website Title?

So, you have to understand the context in order to process the message. You cannot just expose "title" to a UI and presume that the person looking at the screen is going to come away with the proper understanding. The English word and the semantics probably align 95% of the time, but there will be cases where you'll be wrong and those cases might matter because they go into making a critical decision.

IOW, the mapping from a property URI (https://vocab.example/title) to a compact string (title) is a tool that is useful for developers when writing code, as long as they understand the context.

How that property URI is rendered to a non-developer in a UI is an entirely different thing. It could leverage the vocabulary, that might express one or more human-readable strings in different languages, to render the property name to the screen. Or it could leverage a render method that does the same sort of thing, but in a more textually, aurally, or visually compelling way.

Why do we need two mappings, especially when JSON-LD caters for languages?

As highlighted above, we can leverage JSON-LD's language features for classes, properties, and values, but we need to make sure everyone understands the distinction between URL->term compaction, and who that benefits (the developer), vs. URL->display value translation and who that benefits (non-developers).

PatStLouis commented 1 month ago

@aniltj we are planning for it!

David-Chadwick commented 1 month ago

the point I am trying to make is that the property is "https://vocab.example/title" (but it could be" https://a.c.c.com/xyz", it does not really matter.) Computers can process this URL as easy as ABC or XYZ, they don't need a compact form. So why not render directly from "https://vocab.example/title" to whatever you want to display to the user. There is no point in having the compact encoding in between. Or to put it another way, why not have a context that maps directly to the rendering so that the VC becomes whatever you want to display to user, such as "ABC Company Title": "manager"

tplooker commented 1 month ago

TL;DR: The ideal and most scalable solution to displaying VCs is "render methods". I recommend working with the community to help develop these. Anything else is a stop gap measure and should not drive the core design of VC primitives. A variety of these stop gap measures are available today, some being better than others depending on the situation.

It appears my specific use case has become a bit of a distraction from the point I was trying to make, which wasn't around how one could solve credential branding in different ways and to be clear I don't believe any of the options presented in response would have actually solve our issue in this instance. The simple reality is, there was no standard at the time and all we wanted was to be able to extend the VC data model in a way that didn't cause the pain it caused to downstream applications that simply wanted to ignore the branding information.

What I was trying to illustrate is DI based VC's make all @context entries critical for verifier applications to know and trust, even when some @context entries are not critical for all verifiers to understand. This is what creates the scaling issue, forcing systems to load and manage redundant information creates friction for no value and only acts as an impediment.

I do have many other usecases that demonstrate this problem, however I am reluctant to share in this thread as it will just get buried in what is already a difficult conversation to follow. Suffice to say the problem is not about credential branding it is a generic problem that affects many use cases.

I'll try and respond below to some of the suggested solutions to this problem

@dlongley wrote

This only leaves verifiers that are not willing to load contexts from the Web, an intentionally more inflexible subset of verifiers.

This appears as a direct contradiction with the guidance in the data integrity specification, which is as follows:

Implementations MAY load application-specific JSON-LD context files from the network during development, but SHOULD permanently cache JSON-LD context files used in conforming documents in production settings to increase their security and privacy characteristics

Either production systems SHOULD or SHOULD NOT resolve contexts files from the network, you appear to be calling systems that fall into the latter "inflexible", rather then just following the specification guidance.

@dlongley wrote

But don't use a catch-all "issuer-dependent" @vocab, because it doesn't play nicely with others, creates confusion, and incentivizes behavior that leads to more trouble in the ecosystem.

This is at odds with permission-less innovation, in my opinion its a round about way of saying "all extensions to the VC data model must be blessed by this standards community in order to work well".

@dlongley wrote

Verifiers such at this can announce the contexts that they are willing to accept, when requesting VCs. Providers can run the JSON-LD compaction API to recompact a VC to their accepted context. You might even be able to use selective disclosure to remove some terms, if the verifier allows it. This can even help get interoperability with verifiers that simply won't accept any extra terms at all (e.g., JSON schema "additionalProperties": false).

Im surprised to hear that a verifier sending a list of all the contexts it supports to a wallet is considered a solution to this problem. Aside from the fact that this would significantly bloat the size of the average credential request, the wallet would then need to interpret this list and performing checks like

@dlongley wrote

Express the VC using a context you believe they will accept before sending it to them. Again, you might even be able to use selective disclosure to remove some terms, if the verifier allows it.

Again my use case was specifically for when this isn't the case, the context isn't known or trusted to the relying party nor is it required for them to trust it. It appears your solution here in the second sentence would require some advanced tree shaking capability to remove @context entries that aren't used in the selective disclosure case, for this to work and how one achieves this with JSON-LD architecturally seems quite far fetched.

@dlongley wrote

Compact to your special context when using your special terms (that are for your software only anyway), keeping your vendor-specific terms expressed as full URLs otherwise.

So not actually using the compact JSON-LD form in this case (at least for these terms)? Again this assumes the wallet knows which contexts the verifier knows about ahead of time which seems very difficult to establish.

(edited to put backticks around @words)

dlongley commented 1 month ago

@tplooker,

@dlongley wrote

This only leaves verifiers that are not willing to load contexts from the Web, an intentionally more inflexible subset of verifiers.

This appears as a direct contradiction with the guidance in the data integrity specification, which is as follows:

Implementations MAY load application-specific JSON-LD context files from the network during development, but SHOULD permanently cache JSON-LD context files used in conforming documents in production settings to increase their security and privacy characteristics

Either production systems SHOULD or SHOULD NOT resolve contexts files from the network, you appear to be calling systems that fall into the latter "inflexible", rather then just following the specification guidance.

Ah, I can see where some confusion is coming from. Your application must only consume terms from contexts it understands and those are the contexts that should not, in general, be loaded from the Web. If a document arrives with some other context on it, then you need to recompact it to one you do understand (or reject it). The former can be loaded from the Web, the latter should not be.

So, the advice here is for contexts from which terms are actually being consumed by the application. This does not apply to contexts that will be replaced during transformation via the JSON-LD compaction API. We could probably tweak this text a little bit to help implementers understand the difference.

@dlongley wrote

But don't use a catch-all "issuer-dependent" @vocab, because it doesn't play nicely with others, creates confusion, and incentivizes behavior that leads to more trouble in the ecosystem.

This is at odds with permission-less innovation, in my opinion its a round about way of saying "all extensions to the VC data model must be blessed by this standards community in order to work well".

No, this is just saying that if you want to interoperate with others, you should use globally unambiguous terms, do not rely on @vocab in the above way. You do not need permission to use globally unambiguous terms. I'll note that this is similar to advice given to users of JWTs via RFC 7519: "private claims" should be used with caution as they are subject to collision. I also mentioned a number of other problems with using them in my comments.

@dlongley wrote

Verifiers such at this can announce the contexts that they are willing to accept, when requesting VCs. Providers can run the JSON-LD compaction API to recompact a VC to their accepted context. You might even be able to use selective disclosure to remove some terms, if the verifier allows it. This can even help get interoperability with verifiers that simply won't accept any extra terms at all (e.g., JSON schema "additionalProperties": false).

Im surprised to hear that a verifier sending a list of all the contexts it supports to a wallet is considered a solution to this problem. Aside from the fact that this would significantly bloat the size of the average credential request, the wallet would then need to interpret this list and performing checks like

Sorry for any confusion here, I did not mean to suggest that every context a verifier could possibly support would be sent in every request. Instead, only the contexts that are specific to the query being asked would be included. This is already expected anyway -- and how some software works today. I've included a short example below in response to your other comments that I hope helps.

@dlongley wrote

Express the VC using a context you believe they will accept before sending it to them. Again, you might even be able to use selective disclosure to remove some terms, if the verifier allows it.

Again my use case was specifically for when this isn't the case, the context isn't known or trusted to the relying party nor is it required for them to trust it. It appears your solution here in the second sentence would require some advanced tree shaking capability to remove @context entries that aren't used in the selective disclosure case, for this to work and how one achieves this with JSON-LD architecturally seems quite far fetched.

No, advanced tree shaking capability should not be required, rather, just call the JSON-LD compaction API.

As a quick example, suppose there is a common context for international driver's licenses and an uncommon context for localized terms, like for US-only driver's licenses. As far as I can tell, I'm constructing this according to your use case -- as the verifier (aka relying party) does not "know or trust" the US-only context.

So, in this case, the user has a US driver's license with the core VC v2 context core, the international context A and the US context, B. Let's say it expresses that the user has a driver's license with an internationally recognized term like "documentnumber" and some US-only "aamva*" terms, like so:

{
  "@context": [core, A, B],
  ...,
  "credentialSubject": {
    "driversLicense": {
      "document_number": "123456789",
      "aamva_foo": "some-US-only-field-value",
      ...
    }
  },
  ...
}

The verifier only trusts the contexts core and A and it wants the user's "document_number" for their driver's license. So they ask for it like this (notice they include the contexts they understand):

{
  "query": {
    "type": "QueryByExample",
    "credentialQuery": {
      "example": {
        "@context": [core, A],
        "credentialSubject": {
          "driversLicense": {
            "document_number": ""
          }
        }
      }
    }
  },
  ...
}

Now the wallet performs matching (implementation specific) to get a matching VC, perform selective disclosure to produce a VC with just the field desired -- and then pass that to the JSON-LD compaction API with {"@context": [core, A]"} (from the verifier's query) as the context to recompact to. The output is now acceptable to the verifier (it will not contain context B because the wallet called standard W3C APIs).

@dlongley wrote

Compact to your special context when using your special terms (that are for your software only anyway), keeping your vendor-specific terms expressed as full URLs otherwise.

So not actually using the compact JSON-LD form in this case (at least for these terms)? Again this assumes the wallet knows which contexts the verifier knows about ahead of time which seems very difficult to establish.

JSON-LD compacted form is still being used here. It's just that the only context is the VC v2 core context; it just doesn't have your extra context when you're sharing it with others. There might have been more confusion on this point, hopefully this explanation helps. The wallet can always assume the verifier knows at least the VC v2 core context, knowing this context is a hard requirement in the spec. Other contexts could be communicated in a number of ways, including like in the example above.

(edited to put backticks around @words)

awoie commented 1 month ago

@dlongley wrote

Verifiers such at this can announce the contexts that they are willing to accept, when requesting VCs. Providers can run the JSON-LD compaction API to recompact a VC to their accepted context. You might even be able to use selective disclosure to remove some terms, if the verifier allows it. This can even help get interoperability with verifiers that simply won't accept any extra terms at all (e.g., JSON schema "additionalProperties": false).

IMO, not "might", but I rather think you must, since without some sort of SD, this approach does not work for verifiers that don't want to fail if the secured document has terms from additional context they don't know and care about.

It means, you would need to use either BBS, or ECDSA-SD. Otherwise, the verifier would not be able to verify the signature.

David-Chadwick commented 1 month ago

@dlongley wrote

Compact to your special context when using your special terms (that are for your software only anyway), keeping your vendor-specific terms expressed as full URLs otherwise.

What JSON-LD has effectively done is to replace an IANA central registration authority for compact terms with thousands of local registration authorities, which no implementation can feasibly trust. A few will be trusted, like the one published in the VC DM, or in another ISO or IETF standard or at national or international level. The rest will only be trusted by a small subset of entities. Which means that all implementations in general will need to discard all the non-trusted contexts and display the full URIs to the user (because they wont trust any render method for a context they don't trust). This solution will remove the bugs identified by @tplooker and should close this issue. I suggest that we add text to the DI spec to describe this, and move on.

awoie commented 1 month ago

@dlongley wrote

Compact to your special context when using your special terms (that are for your software only anyway), keeping your vendor-specific terms expressed as full URLs otherwise.

What JSON-LD has effectively done is to replace an IANA central registration authority for compact terms with thousands of local registration authorities, which no implementation can feasibly trust. A few will be trusted, like the one published in the VC DM, or in another ISO or IETF standard or at national or international level. The rest will only be trusted by a small subset of entities.

This is essentially the same situation with regular JSON. IANA is one option but nobody prevents anybody from using distributed registries that define JSON property names that are collision-resistant such as using reverse domain names for instance or full URIs. JSON-LD in VCDM uses a similar approach with the difference that term names are compacted to context, so they look nicer which can cause some trouble and EDIT: has implementation considerations/limitations and operational assumptions as discussed in this thread. Especially requiring all context to be known, understood and trusted EDIT: even for those terms a verifier does not care about (as mentioned above), to verify DI, is problematic for the reasons @David-Chadwick just pointed out.

awoie commented 1 month ago

Not sure if helpful, but I tried to go through the VCDM/DI specs to identify problematic or not helpful language.

  1. As per VCDM 2.0 6.3 Type-Specific Credential Processing:

Ensure that all values associated with a @context property are in the expected order, the contents of the context files match known good cryptographic hashes for each file, and domain experts have deemed that the contents are appropriate for the intended use case.

This section is non-normative which might be problematic.

  1. As per VCDM 2.0 - 5.12 Securing Mechanism Specifications:

Securing mechanism specifications MUST provide a verification mechanism that returns only the information in the conforming document that has been secured, without any securing mechanism information included, such as proof or JOSE/COSE header parameters and signatures.

For any any LD-DI crypto suite, the verifiedDocument is the unsecureDocument (including @context) if the signature was verified. It seems odd to me that LD-DI still returns @context in case the attack described in this issue was performed. Wouldn't it be more appropriate to return no @context?

  1. As per VCDM 2.0 - 5.12 Securing Mechanism Specifications:

    Securing mechanism specifications MUST provide [...] A document that only contains information that was successfully secured.

Since the @context is not secured, shouldn't a DI proof not include that information (see above)?

Also to @David-Chadwick's point , perhaps it would be more appropriate to return the expanded document in that case?

  1. As per VCDM 2.0 - 5.12 Securing Mechanism Specifications:

    Securing mechanism specifications SHOULD provide integrity protection for any information referenced by a URL that is critical to validation.

I know it is a SHOULD and not a MUST, but this means Data Integrity Proofs don't adhere to this requirement, since DI proofs don't protect the integrity of @context entries.

I'm also generally missing language in DI proofs that says anything about validation of SRI-protected URLs.

Why is this requirement even included in the spec if none of the LD-DI proofs make use of that?

awoie commented 1 month ago

Another thing I noticed is that people look at VCDM 2.0 + DI mostly as one unit, and disregard that DIs can be used with any other document as well. It seems to me that a lot of language that makes DI securely usable is in VCDM 2.0, not in DI (e.g., SRI, @context considerations). Just from a security and layering point of view I don't like this approach, since DI should be a standalone spec that works with any JSON-LD doc securely.

iherman commented 1 month ago

Another thing I noticed is that people look at VCDM 2.0 + DI mostly as one unit, and disregard that DIs can be used with any other document as well. It seems to me that a lot of language that makes DI securely usable is in VCDM 2.0, not in DI (e.g., SRI, @context considerations). Just from a security and layering point of view I don't like this approach, since DI should be a standalone spec that works with any JSON-LD doc securely.

That might become a little more complicated insofar as the DI spec should be usable, in theory, for RDF datasets at large, not only when they are serialized in JSON-LD... But I take your point.

filip26 commented 1 month ago

The sketched solution here https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184187798 is only one of the many, the simplest one - to prevent consuming data you don't trust by simply rejecting an unknown context set. There are other solutions, even on RDF level, that could be applied to prevent this issue from happening - with pros/cons (robustness, complexity, etc.)

The outcome of this discussion should be something like: Do not ever consume data you don't trust.

dlongley commented 1 month ago

@awoie,

IMO, not "might", but I rather think you must, since without some sort of SD, this approach does not work for verifiers that don't want to fail if the secured document has terms from additional context they don't know and care about.

It means, you would need to use either BBS, or ECDSA-SD. Otherwise, the verifier would not be able to verify the signature.

While it is generally better to use SD for data minimization, it isn't true that you have to use it here. It would work just as well to compact a non-SD VC to the context the verifier supplies. The additional terms would just appear as full URLs (and be ignored) when recompacting to their context, as expected.

msporny commented 1 month ago

@David-Chadwick wrote:

There is no point in having the compact encoding in between.

Which completely ignores the point about the compact encoding being useful for developers. :)

This solution will remove the bugs identified ... I suggest that we add text to the DI spec to describe this, and move on.

There are 11 solutions under consideration at the moment, which one of those do you mean by "This solution"... and what text do you propose to "describe this" given where the discussion is, currently?

@awoie wrote:

since without some sort of SD, this approach does not work for verifiers

There were multiple solutions proposed that don't depend on some sort of selective disclosure and shows how there are multiple ways that the current approach scales.

jogu commented 1 month ago

@David-Chadwick wrote:

There is no point in having the compact encoding in between.

Which completely ignores the point about the compact encoding being useful for developers. :)

I'm not sure it's actually clear that this is the case, at least not from what's written there.

If we compare the form that I believe is being asserted to be developer friendly:

"@context": "https://vocab.example/v1",
"title": "Manager"

and another form mentioned:

"https://vocab.example/title": "Manager"

As a developer, to me, the latter looks simpler to parse and infer meaning from. Did I misunderstand something?

filip26 commented 1 month ago

@jogu

There is no point in having the compact encoding in between.

Compacted version could be useful to some, e.g. to help with adoption/integration.

Iron VC (an open-source library I work on) uses fully expanded IRIs when parsing a VC internally, but gives an option to get a compacted version of the same VC and it's left up to a developer to choose what fits the best.

Actually, mapping an expanded VC to a class instance can be seen as a compaction where a context is defined by class name.

David-Chadwick commented 1 month ago

@msporny Here is some straw man text for consideration in the DI text

DI signature verifiers MUST trust all the @context definitions in the verifiable credential. Globally there are expected to be thousands of @context definitions, and no implementation can feasibly trust all of these definitions. A few definitions will be widely trusted, such as the one published in the VC DM, or in another ISO or IETF standard or by a national or international authority. The remainder are likely to be only be trusted by a small subset of implementations. Consequently all implementations MUST discard all the non-trusted @context definitions and display the full URIs to the human (developer, implementor or end user).

msporny commented 1 month ago

@jogu wrote:

As a developer, to me, the latter looks simpler to parse and infer meaning from. Did I misunderstand something?

Perhaps? You seem to be arguing against the way that the vast majority of most HTTP APIs that return JSON objects work today. I'm struggling to think of a popular HTTP API, or developer API for that matter, that specifies all of its objects using URLs for all properties and type names.

To put a finer point on it, you seem to be arguing that VCs use these sorts of objects:

VC Using Full URLs
{
  "@type": [
    "https://www.w3.org/2018/credentials#VerifiableCredential",
    "https://w3id.org/vdl#Iso18013DriversLicenseCredential"
  ],
  "https://schema.org/description": "A license granting driving privileges in Utopia.",
  "https://schema.org/image": {
    "@id": "...kSuQmCC"
  },
  "https://schema.org/name": "Utopia Driver's License",
  "https://www.w3.org/2018/credentials#credentialSubject": {
    "@id": "did:example:12347abcd",
    "@type": "https://w3id.org/vdl#LicensedDriver",
    "https://w3id.org/vdl#license": {
      "@type": "https://w3id.org/vdl#Iso18013DriversLicense",
      "https://w3id.org/vdl#birthDate": {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
        "@value": "1998-08-28"
      },
      "https://w3id.org/vdl#documentNumber": "542426814",
      "https://w3id.org/vdl#drivingPrivileges": {
        "@type": "@json",
        "@value": [
          {
            "codes": [
              {
                "code": "D"
              }
            ],
            "vehicle_category_code": "D",
            "issue_date": "2019-01-01",
            "expiry_date": "2027-01-01"
          },
          {
            "codes": [
              {
                "code": "C"
              }
            ],
            "vehicle_category_code": "C",
            "issue_date": "2019-01-01",
            "expiry_date": "2017-01-01"
          }
        ]
      },
      "https://w3id.org/vdl#expiryDate": {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
        "@value": "2028-08-27T12:00:00-06:00"
      },
      "https://w3id.org/vdl#familyName": "TURNER",
      "https://w3id.org/vdl#givenName": "SUSAN",
      "https://w3id.org/vdl#issueDate": {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
        "@value": "2023-01-15T10:00:00-07:00"
      },
      "https://w3id.org/vdl#issuingAuthority": "UADMV",
      "https://w3id.org/vdl#issuingCountry": "UA",
      "https://w3id.org/vdl#portrait": {
        "@id": "...RSClooooP/2Q=="
      },
      "https://w3id.org/vdl#sex": {
        "@type": "http://www.w3.org/2001/XMLSchema#unsignedInt",
        "@value": 2
      },
      "https://w3id.org/vdl#unDistinguishingSign": "UTA",
      "https://w3id.org/vdl/aamva#akaSuffix": "1ST",
      "https://w3id.org/vdl/aamva#familyNameTruncation": "N",
      "https://w3id.org/vdl/aamva#givenNameTruncation": "N"
    }
  },
  "https://www.w3.org/2018/credentials#expirationDate": {
    "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
    "@value": "2028-11-15T12:00:00-06:00"
  },
  "https://www.w3.org/2018/credentials#issuanceDate": {
    "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
    "@value": "2023-11-15T10:00:00-07:00"
  },
  "https://www.w3.org/2018/credentials#issuer": {
    "@id": "did:key:z6MkjxvA4FNrQUhr8f7xhdQuP1VPzErkcnfxsRaU5oFgy2E5",
    "https://schema.org/image": {
      "@id": "https://dmv.utopia.example/logo.png"
    },
    "https://schema.org/name": "Utopia Department of Motor Vehicles",
    "https://schema.org/url": {
      "@id": "https://dmv.utopia.example/"
    }
  }
}

Instead of these sorts of objects:

VC using Terms
{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://w3id.org/vdl/v2",
    "https://w3id.org/vdl/aamva/v1"
  ],
  "type": [
    "VerifiableCredential",
    "Iso18013DriversLicenseCredential"
  ],
  "issuer": {
    "id": "did:key:z6MkjxvA4FNrQUhr8f7xhdQuP1VPzErkcnfxsRaU5oFgy2E5",
    "name": "Utopia Department of Motor Vehicles",
    "url": "https://dmv.utopia.example/",
    "image": "https://dmv.utopia.example/logo.png"
  },
  "issuanceDate": "2023-11-15T10:00:00-07:00",
  "expirationDate": "2028-11-15T12:00:00-06:00",
  "name": "Utopia Driver's License",
  "image": "...kSuQmCC",
  "description": "A license granting driving privileges in Utopia.",
  "credentialSubject": {
    "id": "did:example:12347abcd",
    "type": "LicensedDriver",
    "driversLicense": {
      "type": "Iso18013DriversLicense",
      "document_number": "542426814",
      "family_name": "TURNER",
      "given_name": "SUSAN",
      "portrait": "...RSClooooP/2Q==",
      "birth_date": "1998-08-28",
      "issue_date": "2023-01-15T10:00:00-07:00",
      "expiry_date": "2028-08-27T12:00:00-06:00",
      "issuing_country": "UA",
      "issuing_authority": "UADMV",
      "driving_privileges": [{
        "codes": [{"code": "D"}],
        "vehicle_category_code": "D",
        "issue_date": "2019-01-01",
        "expiry_date": "2027-01-01"
      },
      {
        "codes": [{"code": "C"}],
        "vehicle_category_code": "C",
        "issue_date": "2019-01-01",
        "expiry_date": "2017-01-01"
      }],
      "un_distinguishing_sign": "UTA",
      "aamva_aka_suffix": "1ST",
      "sex": 2,
      "aamva_family_name_truncation": "N",
      "aamva_given_name_truncation": "N"
    }
  }
}

If you use full URLs, this is how developers access variables:

const familyName = vc["https://www.w3.org/2018/credentials#credentialSubject"]["https://w3id.org/vdl#license"]["https://w3id.org/vdl#familyName"];

If you use terms, this is how developers access variables:

const familyName = vc.credentialSubject.driversLicense.family_name;

Note that all of your database keys (for indexing and look up) also become full URLs if you use full URLs in the VC... which leads to considerable database bloat because you're repeating long URLs across your entire database, and all payloads, now. I don't know many developers that would prefer full URLs over using terms, and this is borne out through many developer APIs in existence today.

dlongley commented 1 month ago

@David-Chadwick,

I appreciate the concrete text suggestion. I do think it would still be over-reaching a bit, though.

We're producing data model and securing mechanism specs to provide some core primitives to people so that they can figure out how to best put them together in the ecosystem. I think our job is only to document how the primitives work and then get out of the way for innovators to experiment in the ecosystem. For example, we should not be telling other people how they must go about designing their UIs. This latter bit is how I read your text on what to display to humans.

There are a number of things that are given if you read the JSON-LD spec (on which our specs rely) that I'm happy for us to point out again explicitly in our own specifications. I think that's helpful to implementers. For example, I do think it's good for us to say that context is used to map JSON keys to globally unambiguous URLs and that different contexts can map different JSON keys to the same or different URLs, enabling flexibility in a permissionless, decentralized ecosystem. I think it's a good idea to repeat that you must not consume terms defined by untrusted contexts and that you have the option to transform VCs to trusted contexts or to expand terms to full URLs.

But we shouldn't tell implementers what and how they must display things to users. This responsibility, and most others, fall to them. In my view, assuming that we know what is best for every circumstance not only exceeds our remit, but strips people of the freedom they need to employ their own intelligence and ingenuity.

On that point, I'll repeat that people are incubating interesting render methods that show neither URLs nor JSON keys to users, but provide more natural UI, such as skeuomorphic displays, within general purpose digital wallets. Reusable, pluggable software for these render methods can be written such that it will compact an incoming VC to a context for consumption, perform a JSON schema check, and display its contents via a visual template. Again, this is just one example -- and something that we shouldn't get in the way of here.

decentralgabe commented 1 month ago

I finally had the chance to read this long thread. The problem statement is clear, though it seems there isn't yet consensus on the best way to move forward despite @msporny's enumeration 2 weeks ago.

Here is what I suggest as a concrete proposal:

  1. Adjustments to @vocab

9 and 6 related to @vocab, which are already in progress (see https://github.com/w3c/vc-data-model/pull/1520, https://github.com/w3c/vc-data-model/pull/1524).

  1. Enhance Context Validation (aligns with 4; this may already be present?)

Add normative language to the DI specification requiring implementers to rigorously validate @context values. This should include: a) Checking against a list of trusted contexts b) Verifying the integrity of context contents using cryptographic hashes c) Rejecting or transforming documents with untrusted contexts before processing

  1. Introduce Context Integrity Protection (aligns with 4; this is not present, and there are noted tradeoffs)

Develop an extension to DI proofs, including a cryptographic commitment to the @context This allows verifiers to detect any tampering with the context after signing.

  1. Clarify the Processing Model

Update DI spec language to provide a transparent, step-by-step processing model for verifiers that emphasizes secure handling of contexts and transformation of documents to trusted forms before interpretation. We can highlight the potential attacks/vulnerabilities at each step and recommend mitigations.

  1. Improve Test Suites

Ensure the DI test suites include specific checks for these vulnerabilities, ensuring conformant implementations handle them correctly. This will be tough if we do not make strong normative statements about (1) and (2) above, but we can at least note conformance for implementers who have chosen to follow the guidance.


Given the many comments above, these proposals have nuance and tradeoffs, like privacy risks mentioned in being unable to swap out identical contexts, limiting decentralization/scalability, or reducing the flexibility of test/playground instances. The language we add should capture these nuances and ensure the guidance is crystal clear to minimize the security risks this issue was set out to address. It's up to the group to determine the strength of the language (SHOULD vs. MUST) given the aforementioned nuance/tradeoffs. Still, we should continue that discussion on the PRs that are open to addressing 1-3.

If you disagree with this proposal, please let me know a workable alternative.

msporny commented 1 month ago

@decentralgabe wrote:

Here is what I suggest as a concrete proposal:

  1. Adjustments to @vocab

+1

  1. Enhance Context Validation (aligns with 4; this may already be present?)

+1

  1. Introduce Context Integrity Protection (aligns with 4; this is not present, and there are noted tradeoffs)

Strong -1, for all the reasons documented in this thread (it, debatably, doesn't properly address the issue, it makes certain use cases some of us are counting on impossible to achieve, it creates privacy harms, we'd be doing it at the last minute without much implementation feedback, etc.)

  1. Clarify the Processing Model

+1

  1. Improve Test Suites

+1 (noting that if we do this, we'll be wandering into "validation", "application layer", and "business rules" stuff, and that could kick off a whole new debate)

(edited to put backticks around @words)

awoie commented 1 month ago

If you disagree with this proposal, please let me know a workable alternative.

@decentralgabe Additionally, I suggest adding some statements about how to process and validate DI-secured documents when the verifier does not know, understand, or trust all context entries but also does not care about the terms included in them. This ties back to the example with the extended types and respective context entries where the verifier only knows, understands, trusts and is only interested in the terms inferred by the general type defined by the general context entries. (cc @msporny @dlongley ).

PatStLouis commented 1 month ago

Test suites design will be strongly coupled with what is defined in 1 and 3 (context validation + processing model).

The vcdm has a validation section, I could suggest adding a context validation element as part of that list.

I would recommend using the problem details feature and add directives in the algorithm section, instructing how to check against a known list of contexts, configurable by the verifier. I would see something along the lines of Verification software MUST enable developers to configure a list of trusted context urls. Such a list SHOULD contain well-known urls such as the ones depicted in the [data integrity specification](https://www.w3.org/TR/vc-data-integrity/#contexts-and-vocabularies) in addition to other contexts known to the verifier.

To test this, there must be enough information to be asserted by a test client returned by the verification/validation process, where the problem details come into play. Currently, the vc-api is most likely the best tool to expose this verification response to a test client in the form of an HTTPS response body. I believe problems relating to unknown contexts are to be considered warnings and shouldn't invalidate the signature but provide a warning to the controller/coordinator software to take action upon. The signature remains valid, but the context is unknown to the verifier therefore the data has to be handled with caution if not entirely rejected even if the signature is valid.

PatStLouis commented 1 month ago

@awoie I believe this would apply to all validation of json-ld based Verifiable Credentials, irrelevant of which Securing Mechanism is used. The securing mechanism doesn't change the fact that the verifier should trust the context as this is part of validating the credential, not verifying it.

awoie commented 1 month ago

@awoie I believe this would apply to all validation of json-ld based Verifiable Credentials, irrelevant of which Securing Mechanism is used. The securing mechanism doesn't change the fact that the verifier should trust the context as this is part of validating the credential, not verifying it.

I believe that is true but it would be good that what needs to be done to achieve this is documented. It is also important for type-specific processing without general json-ld processing.

PatStLouis commented 1 month ago

+1 for the need of improved documentation.

The only concern I have with type specific distinctions is the polyglot conundrum.

msporny commented 1 month ago

@PatStLouis wrote:

The only concern I have with type specific distinctions is the polyglot conundrum.

The basis of that post doesn't make sense... every JSON object is a polyglot based on what that post argues. JSON.parse() does one thing, and when an application developer uses their application to do anything more than JSON.parse(), they're operating as a polyglot. The second an application ignores a field because they don't understand it in a JSON document, they're operating as a polyglot. Even web browsers operate as polyglot systems -- take any CSS feature that's implemented in one browser but not another, or any feature that sits behind a flag, and you have a polyglot system. The VCDM has ONE data model with ONE syntax, that is JSON-LD... either you do type-specific processing on that JSON-LD document, or you do generalized JSON-LD processing on that document.

I will also warn folks that this thread is long enough without bringing other philosophical debates into this issue. Let's focus on @decentralgabe's proposal and work this issue to closure.

PatStLouis commented 1 month ago

Maybe I misunderstood that post, or misunderstand what type specific means, but my point was that if someone issues and/or verifies a VC without treating it as json-ld (which it is) and then we need to define a different rule set about how to treat a VC as not json-ld, it adds a complexity that is unwarranted as VC were designed after json-ld. The VC spec shouldn't have to define a separate process to account for entities using VC as not json-ld.

msporny commented 1 month ago

The VC spec shouldn't have to define a separate process to account for entities using VC as not json-ld.

We had traditionally called this "JSON-only processing", but that term confused people and made them think that we were giving developers permission to interpret the document in a significantly different way from the result of a JSON-LD interpretation. That same confusion is where that post you linked to above was coming from (but also simultaneously says that VCDM v2.0 is no longer polyglot, and complains about that as well).

JSON-LD is designed so that you do not need to use a JSON-LD library to safely consume a JSON-LD document. For example, there are features (such as @context and @protected) and rules, such as those provided in Type-Specific Credential Processing, that instruct developers how to do that.

So, the VC spec isn't defining a separate process that gets separate results... it's defining a separate process that, as long as a few additional rules are followed, gets you to the same result as someone using a JSON-LD library.

PatStLouis commented 1 month ago

Thank you for this information, I will do some reading. I retract my comments until I get a better understanding of this.

tplooker commented 1 month ago

@decentralgabe wrote:

  1. Introduce Context Integrity Protection (aligns with 4; this is not present, and there are noted tradeoffs)

Just want to clearly express my support for this one, I believe this is the most important of the proposals to implement which actually mitigates the security vulnerabilities raised by this issue.

@msporny wrote:

Strong -1, for all the reasons documented in this thread (it, debatably, doesn't properly address the issue, it makes certain use cases some of us are counting on impossible to achieve, it creates privacy harms, we'd be doing it at the last minute without much implementation feedback, etc.)

Perhaps I'm missing something but I've not seen evidence of this not addressing the issue, with regard to usecases if you mean the ability to re compact a document against a new context, then this can be done when the @context is integrity protected, it just happens post signature validation, which is a much safer time to do it, rather then blending it into the signature validation procedure. I also don't see what possible privacy harms come of this.

TallTed commented 1 month ago

All —

Please review your comments above, and edit them to codefence all instances of @keyword, including those in content within your comment which was quoted from someone else.

All you need to do is wrap those @keyword instances in single backticks, like `@keyword`.

Failure to do this has looped several GitHub users (e.g., @context, @vocab, @protected, @base, all of whom were tagged in the initial comment that opened this issue; I haven't reviewed the whole thread to make a list of all such) into this conversation without their permission nor request.

This is often shrugged off, as, what's the big deal about a few irrelevant notifications? Well, in this thread, it's more like 133 (and counting) irrelevant notifications. In most spheres where I travel, that would be classed as spam and/or harassment.

Please, let's be good gitizens.

iherman commented 1 month ago

The issue was discussed in a meeting on 2024-07-10

View the transcript #### 2.1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272) _See github issue [vc-data-integrity#272](https://github.com/w3c/vc-data-integrity/issues/272)._ > *Brent Zundel:* See [Gabe's comment within the thread](https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2212258255). **Brent Zundel:** concerns about security implications of data integrity signatures. … extensive conversation on the issue. … Original poster of issue as signaled acceptance of the proposal to address the issue. > *Phillip Long:* +1 for Gabe to walk us through it. **Brent Zundel:** pending a P.R to address the issue. … gabe could you walk us though the proposal. **Gabe Cohen:** spent a long time trying to understand this issue. … First proposal is adjustment to `@vocab`. This seems uncontroversial. P.R already open. … Second thing is we need some text about `@context` validation, we need to be very clear about how to handle untrusted contexts. … There is some discussion around a proposal to add signatures to context to make them tamper evident. > *Dave Longley:* +1 to changes to `@vocab`, +1 to some clarifying text around validation requirements, +1 to some tests if we can make them make sense -- validation is application-level and maybe really for a "best practices doc" but fine if we can make it work. > *Dave Longley:* -1 to locking contexts for all the reasons listed in the issue already. **Gabe Cohen:** The outcome of the proposal would be 2 or 3 issues to address these points and associated PRs. **Manu Sporny:** Agree that 0,1,3,4 parts of the proposal are good ideas. > *Dave Longley:* +1 to Manu. **Manu Sporny:** 0 is easy, 1 will take some work. … need to be careful not to pass into application space. > *Anil John:* +1 to 0,1,3,4 ... Have a question regarding 2. Will put myself on the queue to ask. > *Dave Longley:* +1 to Manu, important not to go into the application layer. **Manu Sporny:** will come back to 2, that one is controversial. … 3 is a variation of 1 should be fin. … For test suites, we can improve them to check this. Need to tell implementers to check for bad contexts. > *Dave Longley:* +1 that we're telling implementers to essentially create a special application-level rule, but it could work. **Manu Sporny:** Telling issuers and verifiers to implement business logic to pass the test suites. … +1 to almost all of the things. … we should not mix context integrity with signatures. … it limits some use cases the people are relying on. … Forces disclosure of all of the contexts. … Forced to leak data if mix cryptographic hash with the context. … There are other ways to prevent this attack, would be a strong -1 for mandated context integrity protection. > *Dave Longley:* +1 to not locking contexts for all those reasons and because it doesn't fix the problem :). **Brent Zundel:** For number 2, the integrity of contexts. Is it sufficient to recommend people to use the related resource property. … use related resource for context integrity for people who want it. **Joe Andrieu:** When talking about context integrity, we are talking about some hash. Not that the context is signed over. > *Gabe Cohen:* for 2 I intended to mean a signature, for 1 I mean a hash. **Manu Sporny:** I think they are the same. We aren't talking about signing over the property and the value. We are talking about including the hash of every context in the `@context` property. **Joe Andrieu:** We do not currently sign over the `@context` property. **Anil John:** Is option 2 not a secure meta data distribution problem. … What is the problem we are solving here. … Seems to be multiple options regarding trust registries, having pointers from DID documents. **Dave Longley:** Let someone else speak to aniltj. … Believe option 2 does not solve the problem raised in that issue. That issue is about a validation problem. … where terms are read from a context that is not known. > *Anil John:* Is not (2) a secure metadata (`@context` files being such a thing) distribution problem? **Dave Longley:** even if contexts never changed and were integrity protected, you can still make these mistakes if you don't know the context. **Gabe Cohen:** responding to aniltj, yes secure meta data is part of it. … folks advocating for 2 are concerned that the issuer signed over the context and its properties so they remain untampered for any verifier. … building on brent, wondering if we could have a new data integrity suite to includes signing over context. … still a discussion to be had about whether it addresses the concerns. **David Chadwick:** commenting on the idea of using the related resource property, that is signed over. Noting to stop issuer adding relatedResource property to any URI. > *Dave Longley:* +1 David is correct. > *Manu Sporny:* DavidC is correct. **David Chadwick:** Cannot stop an issuer from using this. **Brent Zundel:** question is do we want to encourage people to use this as a means of addressing this issue. > *Dave Longley:* -1 it does not address the issue for the reasons I stated, this is a validation issue, not a security issue. **Greg Bernstein:** part of this issue from a security point of view comes down to understanding who controls what inputs. … When we talk about this context value, we dont secure the value of the context, we secure all the quads. … do we want to secure the value of the context? > *Manu Sporny:* yes, +1 to Greg -- it's the verifier that controls the input wrt. `@context`. > *Dave Longley:* +1 to GregB, with the rdfc cryptosuites, the underlying information is secured, not the specific expression, allowing alternative expressions. **Greg Bernstein:** or do we accept that the context is under the control of the verifier. E.g. the verifier idicates the context values they accepts. … Verifier says they wont take a context that they dont understand. … Don't protect the context value, because that isn't part of the inputs that need to be protected. The verifier controls the context. > *Anil John:* +1 to Greg regarding the Verifier doing due diligence on the `@context`'s `@did`'s it finds acceptable. In fact, that is what we are doing in practice. **Manu Sporny:** brent said could we just use relatedResource. We could, but how normative would we want to get with that. … is it mandated that if you have a relatedResource with a context, the verifier must through an error if the hashes dont match. … We would end up getting to a lot of the same issues. … GregB is right, the verifier determines the contexts and the hashes of contexts that they accept. … security model is different from the way jose cose do things. … confusion in this thread. JCS does sign over the context values. … but that in itself is not enough, verifier has to choose the contexts they accept. This is a validation layer. An application issue. … if we try to solve this at the crypto layer, you make use cases not possible and it doesn't solve the problem. > *Dave Longley:* you can't know how to interpret JSON from just reading its key names (a JSON key of "firstName" may as well say "asdf"). **Manu Sporny:** Things to consider. This can be viewed as a meta data distribution problem. … We tell people do not ever load contexts over the network. … once a spec is published, verifiers and issuers can lock to very specific contexts if they want to do that. This is expected. > *Dave Longley:* you need to know context to properly interpret JSON (whether it's explicitly given like in JSON-LD via `@context` or you get it out of band somehow). **Manu Sporny:** never have to go to the internet. You should not be doing that. Should have a local copy. … When you get data in, you should check every context and know that you understand these. … This addresses the attack. **Brent Zundel:** Need to start transition to what the next steps are. … if `relatedResource` as a property doesn't fully address this, we did come up with a mechanism for providing tamper evident information. A VC. … What if you are concerned about this, issuers can provide a VC of the context file. … Next, is concern about the context, the same as concern about the public key. Is this key related to this issuer. **Joe Andrieu:** conflating issues between the integrity of the property. Not convinced data integrity does not secure the value of the context. … If I can manipulate the context and still have the VC validate that feels like a huge problem with data integrity. > *Gabe Cohen:* Joe - please review the presentation which has examples to prove that if you manipulate the context verification can still succeed [https://docs.google.com/presentation/d/1MxLMIjubCVykDux8fBWcisXLu9WeE0LxZzU_xlANUMc/edit](https://docs.google.com/presentation/d/1MxLMIjubCVykDux8fBWcisXLu9WeE0LxZzU_xlANUMc/edit). > *Dmitri Zagidulin:* and integrity of the file is addressed by the digest mechanism. **Anil John:** not a tech provider or tool user. I am an user of this technology. … We rely on the recommendations of this group. … As a verifier, what we are doing is manually verifying and reviewing the context to ensure it is acceptable. … Ensure it is coming from an entity whose attestations we want to use in our business processes. … Just signing the context itself, doesn't solve this problem for me. Just gives confidence that the context has not changes and is coming from an specific entity. > *Dave Longley:* +1 a signature on a context tells you nothing about what it is/what it means, context must be processed (at runtime or understood before that and coded against). **Anil John:** Struggling to reconcile what is important and relevant to what we are trying to do. > *Greg Bernstein:* The VC-DI-EdDSA and VC-DI-ECDSA specs allow two different canonicalization approaches. One (JCS) signs over the "JSON", the other (RDFC) signs over "quads" which include complete "term definitions" (URIs) expanded from context. **Ted Thibodeau Jr.:** First, data integrity signs over quads which is the result of applying the context to the json being signed over. > *Manu Sporny:* yes, exactly, TallTed is exactly right. > *Ivan Herman:* +1 to ted. > *Benjamin Young:* +1 good summary, Ted. > *Joe Andrieu:* thanks, Ted. > *Dmitri Zagidulin:* @decentralgabe - I missed the previous discussion of -- doesn't `relatedResources` securing the context address this concern? **Ted Thibodeau Jr.:** This is open world, VCs can go anywhere at any time, I've never verified a VC from issuer X... I need to get context file that's relevant for their VCs -- this is not the /default/ context, this is the Verifier X context, in addition -- I have to retrieve it and make use of it and have assurance that context file that I retrieve is the same as the context file that was used when generating the VC. **Dave Longley:** to respond to TallTed, I think it is the question around secure meta data delivery. … Able to get some context from some source and have confidence it is the context you intended. … on the queue to discuss whether to context value is secured. … the original string of the context is not secured. The context property is fully processed to generate the nquads from jsonld. > *Joe Andrieu:* so the property is secured. It probably shouldn't be dropped. **Dave Longley:** These quads are secured. … If you were to manipulate the context in any meaningful way the sig would fail. **Manu Sporny:** happy to raise PRs for 0,1,3,4. Everything except context integrity protection. … Think I have enough from the group for this. … will take a week or two. … On the queue to respond to brent, we provide an algorithm for how to retrieve the public key but someone could still say you didn't sign over the public key. … This would be the same security disclosure we are discussing here. … You have to as a verifier vet certain things. The context. The public key. Other types of meta data. … This is a good analogy. **Ivan Herman:** related to manu around the PRs to come. … manu said, you have to publish the hash of the contexts they create. … Are we sure we really make it clear that authors of new contexts have to publicly disclose the hash. … if this isn't there then it should be. **Gabe Cohen:** question about the use of JCS. … This could be a way out if JCS is signing over the context value. … How is this communicated. > *Greg Bernstein:* Its in the cryptosuite name... **Gabe Cohen:** What about nested contexts. Contexts that reference other contexts. How do we secure those. … For 2 it may be worth continuing this in a new issue. … Some people not present who might still need convincing. > *Ted Thibodeau Jr.:* it seems that along with disclosure of context hashes, the context hash *should* (maybe *must*) be included in the VC, somehow. **Dmitri Zagidulin:** Main question is doesn't the relatedResource mechanism address a lot of these concerns. > *Greg Bernstein:* See for example: ecdsa-jcs-2019 and ecdsa-rdfc-2019 in the VC-DI-ECDSA spec. **Brent Zundel:** relatedResource exists inside of the data model. Would work only when data integrity is signing VC data models that include related resource. > *Gabe Cohen:* thanks, makes sense. **Manu Sporny:** ivan asked if we should add language around providing hashes of finalized contexts. … Think it is fine to say they should. Concerns around using MUST. … There are other reasons why you may not need to publish a has. … schema.org is well known. They don't need to publish hashes for there contexts. … I might as a verifier, retrieve something and lock to a specific hash that I retrieved. > *Joe Andrieu:* schema.org is a canonical example of semantic drift. **Manu Sporny:** We need to think about the use cases deeply before saying MUST around providing context hashes. … The text we have in the spec is enough to address the attack raised in the issue. > *Dave Longley:* note: if a context changes in a way that de-syncs it from the originally signed information, the signature will fail -- because the original context contents are "baked into" the originally signed information. **Manu Sporny:** A known bad context was included in the verifier as a good context. The end. … it was misconfigured software. … We already have that language in there. > *Dave Longley:* so if a dynamic context is used -- signatures on existing VCs will start to fail -- and that would need to be ok with people taking that approach. **Manu Sporny:** We can clean it up and clarify. … many of these solutions are problematic. The relatedResource and including the hashes in the signature is not solving the problem. > *Dave Longley:* fundamentally: you can't read a term defined by a context without understanding that context. **Joe Andrieu:** I don't agree, I think today people defining contexts can create a URL for a context that has a hash in it. … We don't have to change the VCDM to support that. > *Phillip Long:* +1 to Joe's use hashes in `@context` files if you want them secured. **Brent Zundel:** we have a path forward. Thanks for that. look forward to seeing those PRs. > *Dave Longley:* +1 to Joe's suggestion as a layered way to do things people may want. ---
dmitrizagidulin commented 1 month ago

As an implementer of wallets, verifiers and issuers, and both DI and JWS libraries, I want to add my thoughts to this crucial discussion.

  1. @contexts SHOULD be digest hash protected, for both DI and JWS protected VCs. (I understand there may be some exotic use cases where the issuer might not want to sign over the context URLs and hashes, but I've also experienced first hand the needless confusion and wasted effort that results in NOT locking down contexts with hashes.) But I think those of us who write tutorials and implementation guides should STRONGLY recommend that the relatedResources + digest mechanism be used in VC design.
    • Incidentally, digest hash protection of any resource, including @contexts, should definitely be a general purpose mechanism (not even a Data Integrity specific one, just a general JSON mechanism). But I understand the practicalities of WG deadlines; fitting in to the VC DM is better than nothing (even though it would be even better to have it in a standalone mini-spec).
    • And yes, the relatedResource mechanism does not protect contexts that are included in other contexts (like, for example, the v1 examples context, https://www.w3.org/2018/credentials/examples/v1 -- it included the ODRL context inside it). And so again, tutorials and VC implementation guides should STRONGLY recommend not embedding contexts like that.
    • Btw, the digestMultibase mechanism is strictly better than digestSRI, because it has two critical features: 1) it allows optional JCS canonicalization of JSON resources, whereas digestSRI does not. and 2) It can be base64url encoded, and thus included in URLs, whereas digestSRI is locked to (non-URL) base64 encoding.
  2. The core VC DM 2.0 context should still be marked as @protected: true, BUT the @vocab training wheels should be moved to the https://www.w3.org/ns/credentials/examples/v2 context (or rather, it's already there in the example context, so it should be removed from core VC DM 2.0 context).
    • Asking developers to use https://www.w3.org/ns/credentials/examples/v2 (and its vocab) strikes the right balance of newbie-friendly, but also serves as a reminder that when it comes time for prod, the examples context should be removed (and undefined terms dealt with).
iherman commented 1 month ago

Incidentally, digest hash protection of any resource, including @contexts, should definitely be a general purpose mechanism (not even a Data Integrity specific one, just a general JSON mechanism). But I understand the practicalities of WG deadlines; fitting in to the VC DM is better than nothing (even though it would be even better to have it in a standalone mini-spec).

FWIW, the issue of adding SRI to a @context was on the agenda of the JSON-LD Working Group, but has been postponed for a future release, see https://github.com/w3c/json-ld-syntax/issues/108. I do not know what the plans are on actively reopening this issue in the JSON-LD WG (which is in maintenance mode now), but a future version of VC may well rely on a more general mechanism of a new version of JSON-LD. Which suggests that we should not define a new feature for VC but rely on, for now, the usage of relatedResource.

Cc @BigBlueHat

tplooker commented 1 month ago

@contexts SHOULD be digest hash protected, for both DI and JWS protected VCs.

@dmitrizagidulin could you clarify what you mean by "hash protected" here and how this protection would work? For example do you mean protected by the issuers signature like what has been suggested by @decentralgabe, myself and others? Or instead leaving the protection reliant on an out of band step similar to the current specification guidance and hence optional in practise when it comes to validating a DI proof.

dmitrizagidulin commented 1 month ago

@tplooker

@contexts SHOULD be digest hash protected, for both DI and JWS protected VCs.

@dmitrizagidulin could you clarify what you mean by "hash protected" here and how this protection would work? For example do you mean protected by the issuers signature like what has been suggested by @decentralgabe, myself and others? Or instead leaving the protection reliant on an out of band step similar to the current specification guidance and hence optional in practise when it comes to validating a DI proof.

At present, I think we should go with (and recommend to devs) the 'digest hashes signed over by issuer' like you and Gabe suggested. Specifically, the relatedResources + digest mechanism that's in the current VC DM 2.0, since those properties do get signed over (by both Data Integrity and JWS mechanisms).

Would I prefer adding the digest hashes to the URLs, inline? (So that links would look like this:

{
  "@context": [ "https://www.w3.org/ns/credentials/v2#digest=12345"],
  // ...
  "evidence": [{
    "id": "https://example.com/files/my-dissertation.pdf#digest=4567"
  }]
}

Yes, I'd prefer that mechanism, in the ideal world (if we had time before CR to publish it as a standalone spec, AND if there was political will in the community to add support for this to JSON-LD libraries (or at least on the HTTP client level that those libraries use) in various programming languages). It's a somewhat better mechanism because a) It could be used in embedded contexts (where you add a link to another context inside a context, like examples/v1 did). and b) It's a useful general purpose JSON mechanism.

BUT, given the current landscape, it's more realistic to use relatedResource & sign over it as usual, so that's fine. (and by "that's fine" I mean -- doing that is STILL drastically better than not using any sort of digest).

dmitrizagidulin commented 1 month ago

FWIW, the issue of adding SRI to a @context was on the agenda of the JSON-LD Working Group, but has been postponed for a future release, see w3c/json-ld-syntax#108. I do not know what the plans are on actively reopening this issue in the JSON-LD WG (which is in maintenance mode now), but a future version of VC may well rely on a more general mechanism of a new version of JSON-LD. Which suggests that we should not define a new feature for VC but rely on, for now, the usage of relatedResource.

@iherman Oh, that's really cool! Thanks for the link. I really hope the future json-ld WG does get around to standardizing that mechanism!

kimdhamilton commented 1 month ago

I think I like what Dmitri's suggesting, but to clarify:

Having to use relatedResource would be redundant for someone achieving this another way: content addressable storage, DID Linked resources, etc, and would cause unnecessary cruft.

msporny commented 1 month ago

Noting actions to raise PRs and issues taken on the last telecon.

  1. Adjustments to @vocab

PRs have been raised for this item: https://github.com/w3c/vc-data-model/pull/1520, https://github.com/w3c/vc-data-model/pull/1524, and https://github.com/w3c/vc-data-model/pull/1524

  1. Enhance Context Validation (aligns with 4; this may already be present?)

Now being tracked in https://github.com/w3c/vc-data-model/issues/1529

A PR has been raised for this item: https://github.com/w3c/vc-data-model/pull/1535

  1. Introduce Context Integrity Protection (aligns with 4; this is not present, and there are noted tradeoffs)

A PR has been raised for this item: https://github.com/w3c/vc-data-model/pull/1537

  1. Clarify the Processing Model

Now being tracked in https://github.com/w3c/vc-data-integrity/issues/288

A PR has been raised for this item: https://github.com/w3c/vc-data-integrity/pull/291

  1. Improve Test Suites

Now being tracked in https://github.com/w3c/vc-data-model-2.0-test-suite/issues/88 and https://github.com/w3c/vc-data-model-2.0-test-suite/issues/88 with the expectation of PRs for discussion within a few weeks.

David-Chadwick commented 1 month ago

@dlongley The bit about display which you picked up on is peripheral to the main text. We can easily omit it so that the proposed text becomes

"DI signature verifiers MUST trust all the @context definitions in the verifiable credential. Globally there are expected to be thousands of@context definitions, and no implementation can feasibly trust all of these definitions. A few definitions will be widely trusted, such as the one published in the VC DM, or in another ISO or IETF standard or by a national or international authority. The remainder are likely to be only be trusted by a small subset of implementations. Consequently all implementations MUST discard all the non-trusted@context definitions before processing the verifiable credential"

iherman commented 3 weeks ago

The issue was discussed in a meeting on 2024-07-31

View the transcript #### 2.1. Multiple significant security vulnerabilities in the design of data integrity (issue vc-data-integrity#272) _See github issue [vc-data-integrity#272](https://github.com/w3c/vc-data-integrity/issues/272)._ **Brent Zundel:** Data Integrity: we have one aspect of one key issue left. … an issue raised citing possible vulnerabilities, a solution has been presented, all aspects except one of that solution have been agreed upon. … the folks that raised the concerns believe that it would be most appropriate to add a mechanism for demonstrating context integrity. … other folks are concerned about the overhead of that and that it may not solve the problems. … that is the essence of what we want to talk about right now - how would folks feel about this? what should we do/not do? > *Manu Sporny:* [https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2230883816](https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2230883816). **Manu Sporny:** 272 has 143 comments now, been discussed extensively, Gabe made a proposal many weeks ago that had broad support for a number of the items, can follow your nose to what Gabe suggested and how those things went into PRs. … made adjustments to `@vocab` mechanism, took it out of core context, put in warnings about using it and how it might work in your benefit or against you. … proposal 0 that Gabe did, fully implemented. … the other thing that we ended up implementing was enhancing context validation. … didn't spell out how to do it before. … arguably the main issue with the security issue raise, that they weren't checking their context values. … we now have a normative context validation algorithm in the Data Integrity specification. … eventually that algorithm should be pushed to the JSON-LD spec. … and the VCDM says that you should understand context values before doing any business logic. … we can write tests for this algorithm in the test suite to test against implementations. … that was the second proposal, the other thing was clarifying the processing model (what's the order of steps etc.), that is in as a PR. … the only thing that we now need to focus on is: do we need to put mandatory context integrity protection into the digital signature in Data Integrity. … I don't think it addresses the issue at hand, the problem with the security disclosure is that an important check was not being done. … we have a mechanism to do context integrity protection w/ related resource. … tying this any deeper into the crypto layer is problematic as it prevents certain use cases. … not possible to go to CBOR-LD in certain cases, not possible for entities receiving a message to use their own context. … in RDFC use case it doesn't make sense b.c. you are already signing the quads. … there is an argument in JCS case, but could use resource integrity there. > *Dave Longley:* +1 to allowing issuers (if they desire) to use related resource, +1 to never forcing a particular context on holders and verifiers, they should be free to translate/recompact. +1 that the forcing wouldn't fix the issue anyway. **Manu Sporny:** all doing this tells you is that you are using the same content hash that the issuer used when they created it, this is insufficient to protect against the kind of attack in 272. … what's being proposed 1. doesn't address the core issue 2. prevents important use cases 3. is redundant w.r.t an existing feature. > *Dave Longley:* -1 to a recommendation that would preference it. **Brent Zundel:** would it be possible to add a recommendation that folks do resource integrity? would that be a means of satisfying the folks who have raised this issue? … curious what the drawbacks of that would be. … it's something that we have defined, is straightforward, and maybe would do no harm. > *Dave Longley:* +1 to DavidC. **David Chadwick:** my suggestion would be weaker, just to add a note saying that if the issuer wishes they can integrity protect the context with related resource. > *Manu Sporny:* +1 to DavidC's proposal. **Gabe Cohen:** +1 to DavidC's proposal. **Brent Zundel:** would anyone be opposed to adding this note? anyone want to argue that the context integrity stuff absolutely must happen? … we have a path forward, do we have a volunteer to raise this PR. **Manu Sporny:** I can raise this PR. **Brent Zundel:** we raise the PR, we go into the issue and list the PRs that have been raised, we as a working group believe this addresses the issue, then close the issue - is that right? **Manu Sporny:** sounds good, one thing we can always do is strengthen requirements, if you are working in an ecosystem that needs this, in their spec they can mandate that verifiers must check this, at the base layer we are saying that you can use this feature, but in another spec using this you can make it a must. > *Dave Longley:* +1 to profiles being allowed that can require this or anything else (not encouraging anyone to profile in any particular way though). **Manu Sporny:** it'. … it's mostly at application layer, in a particular ecosystem verifiers must check this etc.