Open tplooker opened 3 weeks ago
Data integrity changes the proof generation and verification procedures to include a hash of the
@context
entries in the document ensuring no manipulation of the@context
entry can be done without detection.
This creates a direct link between an issuer and a context set. Locks a holder in a situation to ask for a new credential every time when a different context is needed for some reason, even in cases when it can be translated one-to-one. Please note, many verification use-cases require only a few claims, especially in a context of SD.
There might be a risk of clustering holders based on a context set (requested to issue/presented to verify) - which would be hardwired - At this very early state of VC adoption we can expect many custom contexts being around.
Making this change without a deep analysis could potentially end up with a similar discussion to this one a few months later ...
some use cases:
Making this change without a deep analysis could potentially end up with a similar discussion to this one a few months later ...
Thank you @filip26. If I understand correctly, this is about how best to securely distribute @context
files. If so, I agree that a deeper analysis would be helpful to understand both the options on the table and the associated trade-offs an implementer needs to consider before making a particular choice.
Checking context is something that needs to happen at the application level and, if it is not checked properly, adding content-integrity checks will not help solve that problem, but it will harm use cases and decentralization.
In sticking with the "human name swapping" scenarios we've been using, take for example an application that will accept either a "Name VC" from an issuer from Japan or from an issuer from the US. In fact, these VCs are protected by some JWT-based mechanism that will ensure that the context cannot be changed without losing protection over the documents.
Now, suppose that the issuer from Japan issues their VC using a "Japanese context" that expresses the first and last name term names in the exact reverse way from the "US context". The issuer from the US issues their VC using the "US context".
The application sees this and is written using pseudo code like this to consume the VCs after verification checks are performed (that would weed out any unacceptable issuers and ensure no changes to the expression of the documents):
if(issuer === 'US') {
// run consumption code of US document, ignoring context
} else if(issuer === 'Japan') {
// run consumption code of Japan document, ignoring context
}
All is well here, for the time being, but it is actually only by chance that this is true in an open world setting. Because then, asynchronously, the issuer from Japan sees that a number of customers in Japan want to be able to use their "Name VC" at US-context-only consuming applications. So, seeing as they weren't using Data-Integrity-protected VCs, they decide they have to also start issuing duplicate "Name VCs" to every customer that wants one, using the "US context".
But now our application has a problem. You see, the application will happily accept these new "US context"-based VCs signed by the issuer in Japan, but the wrong code block will run! Depending on the scenario, this could crash the application or actually swap the data and perhaps produce a worse problem, like the concern here in this thread.
Remember, this is true even though JWT-based protections are used that force a particular context to be used by the holder.
The problem is, fundamentally, that checking the context is an application-level protection that must be performed by the consumer of the information. No basic JWT-verifier is going to check your custom claims or acceptable context combinations, just like no basic data integrity middleware would either. This is a validation responsibility of the application.
We can see that if the application had used this code instead:
// check the context!
if(context === 'US context') {
// run consumption code of US document, after checking context
} else if(context === 'Japanese context') {
// run consumption code of Japan document, after checking context
}
Now, the application would have continued to function just fine after the issuer from Japan made their asynchronous and decentralized decision to enable some of their customers to use the "US context".
But, we can take this a step further. If, instead, the issuer from Japan uses Data Integrity to protect their VCs, they don't even need to issue new VCs to allow their customers to use the "US context". Any party can change the context of the VC without losing the protection. And note that if the application continues to use the second block, which they need to use anyway to properly consume JSON-LD, everything will work properly, no matter whether the context was set the way it was by the issuer or by the holder (or by the verifier themselves). This enhances decentralization, scalability, and open world participation.
@tplooker wrote:
One must ask if whitelisting contexts is such a simple and effective measure alone to mitigate this issue, why doesn't the software highlighted follow this recommendation?
The software you provided specifically allowed the problematic contexts to be used, explicitly bypassing the protections you are criticizing other software in the ecosystem for not supporting. I know we (@awoie, @tplooker, and @msporny) keep talking past each other on this, so I'll keep asserting this in different ways until one of us sees the other's point. :P
The VC Playground software highlighted is playground software that specifically does not implement validation rules.
That is, we specifically do not enforce semantics in the VC Playground because one if its features is to allow developers to add arbitrary VCs and move them through the entire issue/hold/verify process. We did consider adding a "validation" feature to some of the examples, but even if we did that, your complaint would remain. That is, if a developer came along and used their own VC to do a full issue/hold/verify process, there is no way we could know what the validation rules are for their VC... should we reject all 3rd party VCs used in the VC Playground (limiting its use greatly)? Or should we require developers to provide validation rules for each VC (creating a higher burden to add arbitrary VCs to the playground)? In the end, we decided to focus on enabling the issue/hold/verify cycle and to come back to validation later. IOW, validation is out of scope for the playground, but we might add it in in the future.
The digital wallet software highlighted does not attempt to validate VCs because that is (arguably) not its primary purpose in the ecosystem; that's the verifiers job. We could build validation into the digital wallet, but we're hesitant to do so because of the broad range of VCs people can put into a wallet and the likelihood of us getting validation wrong for arbitrary VCs is high. What do we display if we don't know of a particular VC type? A warning? An error? Both seem wrong and the UX would make issuers be annoyed at the wallet software for marking their VC as "questionable" when it's not.
Enforcing application-specific @context
values is the verifier application's job, and in the case of the VC Playground, we chose to NOT implement that for the reasons outlined above.
In my opinion it is because you loose the open world extensibility that the VC data model promises in the process and that is why it is an inadequate mitigation strategy and hence why I've suggested alternative solutions.
Hmm, disagree, but I see that this particular point hasn't been responded to yet (or I missed it). Will try to specifically respond to this point when I get some cycles later this week.
I the meantime, I suggest we open new issues for each of the 9 proposals above and focus on each proposal separately. I know that is asking A LOT of those participating, but I'm also concerned that trying to evaluate 9 proposals in a single thread is going to result in a conversational flow that's going to be hard for everyone to follow. Would anyone object to translating this issue into 9 different issue/proposals and focusing on each proposal in a separate issue?
Some closed ecosystem wallets might have specific validation rules, others might not. Regardless, a verifier should always have validation rules (unless its a public utility tool made available for experimenting/discovering, such as the vc playground, uniresolver, etc) having validation in these environment would simply ruin their purpose. If I set up an agent that will simply verify the proof on a VC, I still need to have some controller to apply business logic. I don't want my barebones librairy come with rigid validations, this is the developer's job to implement. If I want to check VDLs, I will cache the VDL context and verify its integrity.
@tplooker If this isn't a misconfiguration error, how come proper software configuration will prevent this from being exploited? The myth that one single unconfigured verifier software must be able to verify and process every imaginable VC issued is a fallacy. The COVID passport verifier will verify covid passports, the age verification software will verify age according to it's jurisdiction's regulations. And these verifications will not happen with some arbitrary unknown/unheard of context/vc as input. If it does, then you can claim a vulnerability in the software since it was poorly implemented. There has been many vulnerabilities in software, even some leveraging JWT, believe it or not. Here's a list of known attacks.
This being said I enjoyed these demonstrations, and they should be documented in a lab somewhere, maybe even classified in the specification. They highlight risks associated with not properly reading/implementing the specification. kudos for the MATTR team for putting these together.
My suggestion as action items:
@vocab
from the base vcdm 2.0 context, breaking native interoperability with private jwt claims@vocab
for production use cases, unless the implementer is well aware of what this entails. Banning it seems slightly extreme.Hello all, as an organization who will be supporting DI signatures in our product as we look to engage with a wide audience in the credential landscape, I would support the following recommendations (with some suggestions for consideration, given what I have grok'd from the above...)
Data integrity changes the proof generation and verification procedures to include a hash of the
@context
entries in the document ensuring no manipulation of the@context
entry can be done without detection. (Tobias)
- recommended for highly secure applications; and describing a normative way of generating and including these hashes in the signature protected document - so it is clear when the creator of the document intends for this level of protection (and non-extensibility) applies.
We replace all terms with full URLs in all VCs (DavidC)
- recommended for 'production' systems or secure applications (develop/demo/poc with
@vocab
, lock it down when it matters)We more strongly discourage the use of
@vocab
or@base
, possibly banning its usage. (DaveL)
- again, for 'production' or secure applications (so not banning its usage outright, but identifying the security risks involved if the door is left open). I'm not sure if this can be achieved given the base VCDM model...
We strengthen the language around ensuring that the values in
@context
MUST be understood by verifiers during verification, potentially by modifying the verification algorithm to perform normative checks. (Manu)
- also guidance on what "understanding" means please? I understand a hard-coded, static set of context values, but I do get lost when dynamic loaded of contexts becomes a feature... like the German Lange example above... Do I have to "understand" German or not?
Remove
@vocab
from the base context. (DaveL/Kim)
- recommended for secure applications - but again - can we do this for certain scenarios, or is it inherent in the "base context" VCDM?
Document these 2 attack vectors somewhere so they can be included in a pen-testers auditing toolbox. (Patrick)
- +1 ; allowing security teams to detect, and businesses to evaluate their own risk appetite (to a degree)
Improve test-suites to include asserting bad contexts in the conformance stage. (Patrick)
- +1 ; allowing security teams to detect, and businesses to evaluate their own risk appetite (to a degree)
Although I fully support the fully-qualified names approach for ensuring there is no ambiguity in a secured document, I am concerned about the development overhead and lack of flexibility if this is required in all scenarios - but I am happy to learn more about the cost/benefit.
In general I focused on the above because they seem to properly address the described vulnerability when securing DI protected documents, and not focus on alternatives. Business and engineering teams are free to examine alternative methods for securing data and their cost/benefit analysis. But if a choice is made and a solution calls for DI -- how do we protect it as best we can? No solution is perfect, but clearly acknowledging the risks and providing clear guidance to mitigate these risks will help organizations make the right decisions for their needs. (If the mitigations are still insufficient for the use case, consider an alternate solution/technology).
@mavarley,
As explained in the example in my comment above, locking down the context does not solve the problem, but it does create new ones. The fundamental problem is that an application is not performing validation on @context
prior to consuming the document. You MUST do this, no matter the protection mechanism.
also guidance on what "understanding" means please? I understand a hard-coded, static set of context values, but I do get lost when dynamic loaded of contexts becomes a feature... like the German Lange example above... Do I have to "understand" German or not?
Your application must only run against the context(s) it has been coded against. So if there is some context that uses German terms (or Japanese terms, or Frank McRandom's terms) and your application code wasn't natively written against that context, then your application MUST NOT try to consume the document.
When you see the property "foo" in a JSON-LD document, it should be understood as a localized name -- and its real name is the combination of "the context + foo". If you ignore "the context", that is not ok. That is the source of the problem here.
Notably, this actually isn't different from reading so-called "plain JSON" either, it's just that JSON-LD documents are self-describing, so "the context" is announced via @context
. For so-called "plain JSON", you guess "the context" based on the interaction you're having, e.g., which service you think you're talking to, who you think authored the data, the purpose you think they authored it for, things of this sort. Whenever those guesses need to change, you call up / email / text the appropriate parties and figure out how the rollout will work. This is the closed-world, two-party model. In the open world, three-party model, many decisions are made independently, without consultation of all parties, asynchronously, and not everyone knows each other nor can assume "the context".
So, what are your options when your application, written entirely in let's say, English, gets in a document that uses a context with German terms? You can either:
Note that this is very similar to "content negotiation". Some servers will accept application/json
(compare to context A) and others will accept application/xml
(compare to context B). Some will accept both.
Using the JSON-LD API, anyone can translate from context A to context B. Using Data Integrity, this can be done without losing protection on the information in the document.
Checking context is something that needs to happen at the application level and, if it is not checked properly, adding content-integrity checks will not help solve that problem, but it will harm use cases and decentralization.
@dlongley In your mental model, what @context
is used in the DI verification process and is it the same @context
that is provided to the business logic of JSON-LD-enabled (processing) verifiers? I thought verifiers have to know and trust @context
, at least for DI verification but it appears that you are also saying that there might be other @context
values that can be applied.
It sounds to me, that in your mental model, the issuer/holder provided @context
is primarily used for DI verification purposes, but you also have the requirement to apply additional @context
, e.g., for translation. These other @context
entries seem to be not related to the DI verification logic, and it appears they are primarily used for JSON-LD processing in the business logic. Where does the verifier get these other @context
values from? Is your assumption that these can be any trusted third-parties, or are they provided by the issuer? Wouldn't you still be able to inject these new @context
entries for the business logic after DI verification with and without integrity protecting the @context
in DI?
Perhaps I'm not following correctly, but in your mental model, who determines what @context
to apply at what stage (verifying proof, data model processing), i.e., issuer, verifier, holder, any (trusted) party; and for which layer (DI verification vs JSON-LD business logic processing)?
It would also really help if we could always keep an eye on a holistic solution when evaluating the proposals made in this thread, i.e.,
Is there any combination that is not valid, e.g., DI verifier + JSON processor seems to be odd although this is probably what most people are doing, i.e., using the compact form.
I guess JSON-LD processors can rely on the expanded terms (IRIs) but I haven't seen many implementations that do. It was probably not helpful to have a polyglot approach to VCDM with all the different combinations of JSON-LD/JSON across the data model and securing mechanism layer which is why we ended up here.
Irrespective of the solution we land on, I'd hope to be as explicit as possible in the spec and explain how this relates to options 1-4 above, and probably also 5-6.
@tplooker If this isn't a misconfiguration error, how come proper software configuration will prevent this from being exploited? The myth that one single unconfigured verifier software must be able to verify and process every imaginable VC issued is a fallacy. The COVID passport verifier will verify covid passports, the age verification software will verify age according to it's jurisdiction's regulations. And these verifications will not happen with some arbitrary unknown/unheard of context/vc as input. If it does, then you can claim a vulnerability in the software since it was poorly implemented. There has been many vulnerabilities in software, even some leveraging JWT, believe it or not. Here's a list of known attacks.
If this were just a misconfiguration issue, then why is the vcplayground, the three connected wallet applications and ~12 VC API backends connected to the vcplayground all "misconfigured". Surely if this is an obvious misconfiguration issue with no tradeoff, like you suggest then these software packages should have no issue being configured correctly?
Of course in reality its not because these aren't valid "applications" like has been previously argued by @dlongley, they are, its because adding in this configuration means they can't easily scale with new credential types without painful, careful reconfiguration. That is why the VC playground and all connected software today doesn't follow this advice and why it isn't a practical solution to this problem.
The software you provided specifically allowed the problematic contexts to be used, explicitly bypassing the protections you are criticizing other software in the ecosystem for not supporting. I know we (@awoie, @tplooker, and @msporny) keep talking past each other on this, so I'll keep asserting this in different ways until one of us sees the other's point. :P
Understood assert away :P and I will continue to make my point which I don't believe is being understood, as I've said before, the evidence in this community speaks for it self, we have plenty of examples of software "misconfigured" to use your terminology and little evidence of software that actually even follows this recommendation and thats because this isn't a configuration issue.
Your application must only run against the context(s) it has been coded against. So if there is some context that uses German terms (or Japanese terms, or Frank McRandom's terms) and your application code wasn't natively written against that context, then your application MUST NOT try to consume the document.
This approach as a mitigation which is to perform hard validation against every context in a presented credential (effectively whitelist every context) simply doesn't scale and below I outline a usecase which demonstrates exactly why.
Note, this isn't a theoretical use case either, we have lived this through real deployments of LDP and DI.
At MATTR several years ago we decided to extend the VC 1.0 data model to include our own early attempt at credential branding. This involved us defining our own company based @context
value to extend the base data model with these terms. Then every credential we issued had to include this context value so that branding terms we defined became defined. As a consequence, because we wanted to limit our document loader from resolving contexts over network. We had several significant deployment issues where some downstream applications of ours didn't have the required @context
entries resolved, meaning credentials failed verification until we did a redeploy. The pain was, these @context
values defined terms that weren't even being processed by the wallet and verification software as the software didn't understand the branding by design! It just simply wanted to ignore this portion of the VC's and couldn't without being redeployed with a redundant @context
value. Then when we updated this context multiple times over the deployment we had to pre-distribute the new @context
values into the wallet and verification apps and wait for it to propagate before we could safely issue new VC's using the new context values. This required heavy co-ordination that was only possible because we were the issuer, wallet and verifier software, it simply wouldn't have been possible in a scaled and open ecosystem.
So in short @dlongley @filip26 @msporny and others, we have lived experience with your proposed solution here and it just does not work. It assumes all context values in an issued credential are critical to process when there many cases (like above) where some @context
entries are totally irrelevant to a downstream wallet or verifier and forcing these applications to have to explicitly trust these redundant @context
entires is a brittle, error prone, unscalable solution.
I had the update the permutations in my previous post because I figured there is also DI + JCS + JSON but it contains JSON-LD, so there might be JSON-LD and JSON processors. So, here are the updated permutations a solution should cater for:
@mavarley
Data integrity changes the proof generation and verification procedures to include a hash of the
@context
entries in the document ensuring no manipulation of the@context
entry can be done without detection. (Tobias)recommended for highly secure applications; and describing a normative way of generating and including these hashes in the signature protected document - so it is clear when the creator of the document intends for this level of protection (and non-extensibility) applies.
There is nothing like "less secure apps" (perhaps you have meant a profile or something like that?). Now it looks like an euphemism to say make it mandatory .
@tplooker
So in short @dlongley @filip26 @msporny and others, we have lived experience with your proposed solution here and it just does not work. It assumes all context values in an issued credential are critical to process when there many cases (like above) where some
@context
entries are totally irrelevant to a downstream wallet or verifier and forcing these applications to have to explicitly trust these redundant@context
entires is a brittle, error prone, unscalable solution.
I'm sorry, I don't believe the "lived experience with the solution", (e.g. sketched here: https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184187798) because you would not have security issues like the one reported and demonstrated.
Regarding unused @context
entries, etc. would not locking a context into a signature make it much worse?
The issues have been explained here several times already. Locking @context
into a signature threatens decentralization, scalability and even privacy.
If this were just a misconfiguration issue, then why is the vcplayground, the three connected wallet applications and ~12 VC API backends connected to the vcplayground all "misconfigured". Surely if this is an obvious misconfiguration issue with no tradeoff, like you suggest then these software packages should have no issue being configured correctly?
If you are willing to die on this hill that the vcplayground is representative of production software deployed to verify sensitive information and should be configured the same so be it. I can't take an exploit demonstrated in a public demo environment as empirical evidence that every software deployed is vulnerable in the same way.
If you are willing to die on this hill that the vcplayground is representative of production software deployed to verify sensitive information and should be configured the same so be it.
I guess another way to put it, @tplooker, is: If we implemented strict checking of @context
in the VC Playground, and removed any ability for a developer to add testing their own VC to the Playground, would you agree that the issue you reported is a "configuration issue"?
To be clear, Digital Bazaar's production deployments do strict checking of @context
by not loading @context
values from the network and use pre-cached, vetted contexts. So, yes, there is production software out there that takes this approach, which is recommended in the VCDM specification. We do reject contexts for which we know nothing about by default, because that's the safest thing to do (again, we'll get to your "does not scale" argument, which it does, later).
@tplooker wrote:
PROPOSAL: Data integrity changes the proof generation and verification procedures to include a hash of the
@context
entries in the document ensuring no manipulation of the@context
entry can be done without detection.
but then you say:
performing a hard validation against every context in a presented credential (effectively whitelist every context) simply doesn't scale
Those two statements seem logically contradictory, please help me understand them.
In order to accomplish "including a hash of the context entries in the document", you have to have a hash of each context entry when you issue AND the verifier needs to be able to independently verify the hashes of each context entry when they verify. IOW, the issuer needs to understand the contents of each context used in the VC and the verifier needs to understand the contents of each context used in the VC (or, at least, be provided with a list of trusted hashes for each context they are verifying).
You then go on to say that allow listing contexts in that way is not scalable.
The specification insists that a verifier needs to check to make sure that they recognize every context in a VC before they take any significant action.
What is the difference between the verifier knowing the hashes of every context and the verifier checking the URLs of every context (which is vetted "by contents" or "by hash")? What am I missing?
(edited to put backticks around @words
)
What is the difference between the verifier knowing the hashes of every context and the verifier checking the URLs of every context (which vetted by contents or by hash)? What am I missing?
@msporny why does the verifier need to know the hashes? Wouldn't it be possible to sign over the hashes and include the hashes in the proof
object? Verifying the proofs would also include the verifier computing those hashes and checking them against the included hashes. I'm not saying this is my preferred solution but just asking whether I'm missing something here.
@awoie,
why does the verifier need to know the hashes? Wouldn't it be possible to sign over the hashes and include the hashes in the
proof
object? Verifying the proofs would also include the verifier computing those hashes and checking them against the included hashes. I'm not saying this is my preferred solution but just asking whether I'm missing something here.
All this would do is prove the document still expresses "something" in the same way it did when it was issued. But, as a verifier, you still don't know what that "something" is. You have to understand the context to actually consume the information. You don't have to understand that to confirm that the underlying information hasn't changed or to transform it from one expression to another (that you might understand).
So, the verifier will have to know the contexts (they can know them by hash or by content, as these are equivalent), such that they have written their applications against them, if they are to consume any terms that are defined by those contexts. This is why it does not matter whether the context is different from what the issuer used -- it doesn't help. Adding signed hashes doesn't help. In fact, if you lock the context down to a context that the verifier does not understand, it hurts.
If there's a context that a non-compacting verifier could use to consume the document, but the holder isn't free to compact to that context, then the verifier will not be able to accept the document. The holder would be forced to go back to the issuer and leak to them that they'd like to present to a verifier that only accepts documents in another context, asking for them to please issue a duplicate VC expressed in that other context.
If you have some special auxiliary terms that you want to consume in your own application, that you think many verifiers might reject based on a context they don't recognize:
"additionalProperties": false
). Not everyone wants to accept something with a random https://example.com#meow
property containing a huge list of favorite cat names, even if there's a unique and fantastic cat application that benefits from it. If your special terms have more utility than that, consider working with the community to provide a feature that everyone can benefit from (e.g., maybe a "render method"?), increasing the likelihood of acceptance by others.@msporny why does the verifier need to know the hashes? Wouldn't it be possible to sign over the hashes and include the hashes in the
proof
object?
Ok, let's presume that's what we do... let's say we do something like this in the proof
property:
"proof" : {
...
"contextDigest": [
["https://www.w3.org/ns/credentials/v2", "0xfb83...43ad"],
["https://www.w3id.org/vdl/v1", "0x83a1...bc7a"],
["https://dmv-vocab/ns/dl/v1", "0x9b3b...24d4"],
]
...
]
When DI generates the proof, that content is signed over (both in RDFC and JCS). Alright, now the issuer has explicitly committed to cryptographic hashes for all context URLs and wallets and verifiers can check against those context hashes.
Verifying the proofs would also include the verifier computing those hashes and checking them against the included hashes.
Yes, and for the verifier to compute those hashes, they need to fetch and digest each context URL listed above (which means they now have the entire content for each context)... or they need to have a list that they, or someone they trust, has previously vetted that contains the context URL to hash mappings.
Having that information, however, is only part of what they need to safely process that document (and I'm going to avoid going into the use cases that we make impossible if we take that approach just for the sake of brevity for now --EDIT: Nevermind, turns out Dave and I were answering in parallel, see his post right above this one for some downsides of locking down the context hashes at the issuer). IF (for example) we continue to allow @vocab
, that means that any context can come along and override it, which means that if the verifier wants to continue to be safe, they need to ensure that they are ok with what each context does, which means that they need to trust that each context used does things that they're ok with (like not override @vocab
or overriding @vocab
in a way that they're ok with or overriding unprotected terms in a way that is suitable for that use case or not attempting to protect terms that are already protected, etc.).
The point is that the contents of each context need to be known by the issuer (in order to hash them and generate the proof) and by the verifier (in order to verify that the contexts have not changed from when the issuer used them)... and if each party knows that information, then they have to know about each context and its contents (either by value or by cryptographic hash)... and if you know that information, you can verify the signature (and it'll either work if nothing the verifier is depending on has changed, or it'll fail if the contexts don't line up for the information that has been protected, which is what matters).
Did that answer your question, @awoie?
PS: As a related aside, I'm pretty sure we're using the words "known (context)", "understands (the context)", "trusts (the context)" in different ways that are leading to some of the miscommunication in this thread. I don't know what to do about it yet (other than keep talking), but just noting that we probably don't mean the same things when we use those words.
Those two statements seem logically contradictory, please help me understand them.
They aren't contradictory, but happy to explain.
Fundamentally including a hash of all the @context
entries as a part of the signed payload accomplishes the following
It provides assurance to the issuer that in order for a relying party to be able to successfully verify their signature, they MUST have the same exact context as the issuer who produced the credential. This universally ensures context manipulation cannot happen after issuance without detection. And I might add there are more ways to mess with the context outside of the vulnerabilities I described at the start of this issue, so this just solves all of that out right.
Because these @context
values are integrity protected it actually means that a relying party could download them in certain situations over a network safely if they don't already have them, because if they get corrupted or tampered with in anyway, they are going to then fail in the signature validation and this is the key to solving the scalability challenge. The use-case I described above gets somewhat more bearable if I as a verifier encounter a VC with a context I don't understand and isn't actually critical to me understanding, I can safely resolve it over a network, cache it and be confident it hasn't been messed with when I validate the signature. This isn't a perfect solution, but it is much better then the current state of play and likely the best we can do with data integrity without simply just signing the whole document with JWS instead, which of course would be much easier.
The important difference between your proposal and mine @msporny et al, is your solution
1) Relies on guidance that developers can ignore, VC playground and all connected software to it, despite what is being insisted about what this means, is at a minimum clear evidence that implementations CAN and WILL ignore the advice to pin contexts currently in the spec, leaving them entirely open to these and other vulnerabilities.
2) Provides the issuer with no enforce-able way of knowing that the only way their signature will verify is if the verifier has the contexts the issuer used.
3) Won't work at scale because of the need to have a preprogrammed awareness of all possible @context
values that issuers are using ahead of verification even ones that aren't critical for an application to understand.
(edited to put backticks around @words
)
IF (for example) we continue to allow
@vocab
, that means that any context can come along and override it, which means that if the verifier wants to continue to be safe, they need to ensure that they are ok with what each context does, which means that they need to trust that each context used does things that they're ok with (like not override@vocab
or overriding@vocab
in a way that they're ok with or overriding unprotected terms in a way that is suitable for that use case or not attempting to protect terms that are already protected, etc.).
Only if @vocab
isn't fixed like it should be in JSON-LD to respect the @protected
keyword.
(edited to put backticks around @words
)
@tplooker wrote:
Only if
@vocab
isn't fixed like it should be in JSON-LD to respect the@protected
keyword.
@dlongley explains in this comment why what you are requesting is a logical impossibility.
To summarize:
@vocab
to be https://www.w3.org/ns/credentials/v2
.@protected
is applied to @vocab
in the VCDM v2 context, then that would mean that all contexts other than the VCDMV v2 context would throw an error. This would happen because those other contexts would be trying to re-define undefined terms that have already been expanded by @vocab
in the VCDM v2 context.That might, understandably, seem counter-intuitive to some, but it does make logical sense once you think about it. So, let's walk through an example:
In year 1, I use just the VCDM v2 context, which I'm going to ship to production (the reason doesn't matter, I'm just going to do that). In that VC, I use MySpecialCredential
for the type
and it has a website
property in credentialSubject
to express my website. The base context has @vocab
and so those terms are "issuer-defined" and are therefore expanded to https://www.w3.org/ns/credentials/issuer-dependent#MySpecialCredential
and https://www.w3.org/ns/credentials/issuer-dependent#website
.
In year 2, I decide that I want to define those more formally, so I create a new context that I'll append after the VCDM v2 context and in that context I define MySpecialCredential
and the website
property. When I run a "fixed" JSON-LD processor that protects @vocab
on my document, which includes the new context, it throws an error. But, why did it throw an error?
It throws an error because @vocab was protected in year 1, which catches ALL undefined properties in the VCDM v2 context. MySpecialCredential
and website
are not in the base context, they're undefined, so they're caught by the VCDM v2 protected @vocab
statement. Now, in year 2, the second context comes into play, it tries to re-define MySpecialCredential
... but it can't do that, because MySpecialCredential
is already mapped via the base VCDM v2 @vocab
statement... which is protected, so the processor throws an error because I'm trying to re-define something that is already defined by the base VCDM v2 context. If we added the ability to protects a @vocab
assertion in a context, it necessarily causes ALL subsequent contexts to throw an error.
Again, I know it sounds like a "nice to have" when said out loud, but when you think through the logical implementation of it, it doesn't work. I hope it's clear at this point that proposal 2 is unworkable.
If there was some other way you were expecting it to be implemented, please let us know; perhaps we don't see what you see.
(edited to put backticks around @words
)
@tplooker wrote:
This isn't a perfect solution, but it is much better then the current state of play and likely the best we can do with data integrity without simply just signing the whole document with JWS instead, which of course would be much easier.
Just "simply signing the whole document with JWS" does not:
It is a red herring; it is not a solution to the concerns that you have raised. A verifier still has to ensure that a VC secured with any enveloping signature contains the semantics that they expect. They cannot just blindly accept any list of contexts and start executing business rules, even if they trust the issuer.
@tplooker,
Only if
@vocab
isn't fixed like it should be in JSON-LD to respect the@protected
keyword.
As mentioned above, if I understand your ask properly, I think it is a logical impossibility.
The purpose of @protected
is to allow consumption of specific terms in JSON-LD documents when only the contexts that define those terms are known (and other contexts are not, but they can be present). So, for example, if you have an array of contexts: [A, B]
, then an application can consume @protected
terms defined in context A
, without knowing context B
(and importantly, not consuming any of the terms defined by context B
). Again, note that a consumer MUST always understand the context that defines the terms it consumes -- and this holds true here.
Now, the way that @vocab
is being used today in the VC v2 core context is as a "catch all" for any JSON keys not defined in that context. I believe you're asking that we apply "protection" to this value, with the aim of supporting consumption of terms in JSON-LD documents with contexts such as: [<core vc v2 context>, <unknown>]
. However, @vocab
"defines all terms" when it is used as a "catch all". By defining all terms in a protected way, it necessarily means that no further terms can be defined -- in any subsequent context.
It would not be possible to ever have the core VC v2 context be followed by any other meaningful contexts. Clearly this is not desirable and would prevent every other common use of VCs. If a consumer desires the definition of any other terms after a "catch all" @vocab
to be prohibited, they can require that the context with this definition be the last context in documents they accept -- or they can use the JSON-LD compaction API.
@tplooker wrote:
They aren't contradictory, but happy to explain.
You didn't address the point of contention. The point of contention was that you (and @awoie, I presume) assert two things in your solution:
But then both of you state that distributing contexts in this way doesn't scale.
It sounds like you're saying that "even if we do 1 and 2, the solution won't work anyway, because there is no scalable way to distribute contexts".
It may be that you and @awoie think that the /only/ way for the verifier instance to operate is by having a completely fixed and static list of contexts they accept (and that that doesn't scale). It might be that you think that @filip26's example, which was just the simplest example that could be provided to demonstrate how easy it is to protect against the attack you describe (which is a minimum bar that the specification suggests), is the "one and only way" we're proposing? If that's the misunderstanding, then I can understand why you and @awoie are saying what you're saying. If it isn't, then I'm still seeing a contradiction.
Please clarify what you mean by "does not scale", because it's a misunderstanding we could focus on and clean up before continuing with analyzing solutions.
In an attempt to address the assertions you made above, which are beside one of the points of contention above:
@tplooker wrote:
Relies on guidance that developers can ignore
I already covered this point above.
Developers can ignore any guidance in the specification. We call that "doing a bad job" or, at worst, a non-conforming implementation. We can write algorithm language and tests that make it far less likely for a conforming implementation to misimplement in the way that you are concerned about.
I think we will get consensus to "do something" here, we're just debating what that "something" needs to be. At present, there is contention over at least two approaches:
@vocab
to be used by those that want to do so).VC playground and all connected software to it, despite what is being insisted about what this means, is at a minimum clear evidence that implementations CAN and WILL ignore the advice to pin contexts currently in the spec, leaving them entirely open to these and other vulnerabilities.
The playground does not pin to context hashes because many of the contexts used are changing regularly. Data Integrity (using RDFC) gets its security from the signed statements, which cryptographically hash only the values in the context that are used. Verifiers must check context values that are used in messages sent to them in production.
Developer software and playgrounds are NOT to be confused with production software.
Provides the issuer with no enforce-able way of knowing that the only way their signature will verify is if the verifier has the contexts the issuer used.
IF an issuer and a verifier follow the rules and guidance in the specification today, they are guaranteed (in an enforceable way) that the number of statements, the protected terms, and the information they expressed will not change when the verifier checks them.
If the issuer is sloppy in production and uses @vocab
and the verifier is equally sloppy and doesn't check incoming contexts (which are the "vulnerabilities" disclosed in this issue)... then that's where things can go wrong and we should improve the specification text to make that a non-conforming implementation.
Won't work at scale
I cover this point in a previous comment.
I have raised w3c/vc-data-model#1514 to evaluate what to do about @vocab
in the VCDM v2 specification and context. Please provide input over there, on that item specifically, while we process other parts of this issue here.
/cc @dlongley @kimdhamilton @aniltj @ottonomy @PatStLouis @mavarley @peacekeeper
@dlongley explains in this comment why what you are requesting is a logical impossibility.
Its not a logical impossibility, the simple reality is that the @protected
feature is flawed because it doesn't protect all term definitions, rather it only protects terms that are defined through mechanisms other then by using @vocab
and @base
.
This simply leaves the following options at a high level
1) Fix how term protection works so all terms are protected from redefinition 2) Remove the protection feature because it's dangerous to have a feature that doesn't work universally
With regard to option 1, I understand that if you are using @vocab
in one of your contexts and you were to enforce protection, it would mean logically subsequent contexts could not follow. But that actually just highlights a fundamental design issue with JSON-LD and the need to integrity protect the context entries under the issuer signature, because if they were you would protect against term redefinitions universally.
So in short if you integrity protected the @context
entries you would ensure any terms defined via @vocab
could not be redefined, because any manipulation to the @context
of a document would be detected.
You didn't address the point of contention.
I did, I'm just not sure you understood how, so let me try another way.
The difference is "how" the hash of a context is, calculated, distributed, verified and finally how its integrity is related to the integrity of a document being verified using a context.
In your model this process is entirely out of band and an optional step to perform. Put plainly, in your model an implementation won't (as we have numerous pieces of evidence to this)
1) Check @context
URL entries at all against trusted ones
2) Compute the respective hash of each of the @context
entries
3) Check the computed hash of each @context
entry against some expected value
Our proposal is to make this check required, meaning the signature can't verify if implementations have not performed this step, thus making it far more robust. In the process of requiring these additional steps to be performed, it will materially improve the scalability of DI solutions because now one can actually load @context
entries over the web, compute their respective hash and still be able to verify they haven't be tampered with.
The point of contention was that you (and @awoie, I presume) assert two things in your solution:
To be clear, yes to 1) and to point 2) the verifier must know the procedure required to compute the hash of a given context, so that it can input the obtained hash into the proof verification procedure and check it matches the hash the issuer used for that context. Therefore, the verifier establishes "legitimacy" in the computed context hash through the proof verification process, because if proof verification succeeds then the verifier has the right context, if it fails something is wrong.
@tplooker,
Its not a logical impossibility...
It is. You cannot apply the @protected
feature to a "catch all" @vocab
without prohibiting further meaningful contexts. Therefore, you cannot have both "further meaningful contexts" and a "catch all" @vocab
that is @protected
.
With regard to option 1, I understand that if you are using
@vocab
in one of your contexts and you were to enforce protection, it would mean logically subsequent contexts could not follow. But that actually just highlights a fundamental design issue with JSON-LD and the need to integrity protect the context entries under the issuer signature, because if they were you would protect against term redefinitions universally.
No, this would not solve the problem; any issuer acting non-maliciously could still redefine terms using subsequent contexts, as they should be able to. In fact, with @vocab
in the core context, this is exactly what already happens with every subsequent context today. I think this may be highlighting a fundamental misunderstanding of what term protection is.
Term protection is not to prevent "term redefinition via maliciously injected contexts". It is to prevent "term redefinition in subsequent contexts". It is not about malicious behavior at all. The @protected
feature allows consumers to know that a term defined by an early context can be consumed without having to understand the contexts that follow it. A document author that uses such a context simply makes a commitment to the term definitions therein. A document author that uses a context without @protected
terms makes no such commitment -- this includes any terms defined by @vocab
.
An issuer is totally free to use @vocab
in one context and then define another term in a subsequent context. A consumer that does not understand the subsequent contexts is not permitted to consume terms defined by @vocab
in this case.
Again, context matters; you can't consume the terms from a context without understanding it. A signature over contexts does not change this. You cannot rely on issuer authentication in a closed world, two-party model to approximately limit semantics here because that model does not apply. This thread even contains examples demonstrating how it doesn't. Instead, application-level validation will protect an application from consuming terms from unknown contexts, including malicious ones. This solves the problem.
It seems to me that the whole context model is wrong. The key fact about the subject's property/PII/attribuite is that the URI is the absolute id of the property. Any alias context term is just for ease of human understanding and reference. Computers are happy to work with URIs. Thus the context alias term should be local and not global. It is not much use to me if a Japanese issuer presents a context term in Kanji. It's probably less understandable to me than a URI. But I can define a local English term for the URI which my users (holders) can utilise. Similarly a French verifier can utilise a French name for its local context term. However the issuer, holder and verifier will ultimately use the URI as the id of the property and there will be no confusion between anyone one of them about what the property is. So the signature should be over the URI for the property and not over any local context property names. Then it wont matter what the contexts are, as they wont figure in the signature computation.
@David-Chadwick wrote:
Thus the context alias term should be local and not global.
That is how it works today. That is the whole concept behind @context
.
So the signature should be over the URI for the property and not over any local context property names.
This is exactly how it works today.
It seems to me that the whole context model is wrong.
Except for the part about sending out expanded form instead of compact form -- which would be painful for developers to deal with -- the rest of it works in the way that you say it should.
@msporny, as a consequence of what you say, then when the issuer sets up its own local context in the VC, this should be bound into the signature so that the holder and verifier can obtain the correct URIs from the issuer provided context and short form names. It must not be possible for the issuer's local context to be subsequently modified, added to or otherwise changed by anyone without invalidating the original signature. This guarantees the correct semantics for the signed VC. Data Integrity should specify how this is achieved. Holders and Verifiers may however modify the context locally (after signature verification) to display different short form names to their users that are more easily understandable to them. But this is a local display issue.
@David-Chadwick
as a consequence of what you say, then when the issuer sets up its own local context in the VC, this should be bound into the signature so that the holder and verifier can obtain the correct URIs from the issuer provided context and short form names.
There is no need to lock @context
set into a signature, it's actually redundant information in order to verify the data integrity. If @context
provides different expanded URIs then a verification fails. The other disadvantages of locking @context
into a signature have been mentioned here several times. I'm afraid we are spinning around.
It must not be possible for the issuer's local context to be subsequently modified, added to or otherwise changed by anyone without invalidating the original signature.
Yes, it must be possible to any party to change the @context
for many reasons, e.g. tracking prevention which is only one of them. And it is possible because of the semantic model that protects data not the shape. If you lock @context
into a signature then you protect data and shape. The shape is not important but the data is.
Holders and Verifiers may however modify the context locally (after signature verification) to display different short form names to their users that are more easily understandable to them. But this is a local display issue.
It's not just display issue. Now, at the very beginning of VC adoption we can expect many different @context
sets being around, boom of them, then it slowly starts converging to some generic subsets accepted more widely. Those custom contexts uniquely identified by URLs could tell a lot about an intent. The other major plus is that a consumer does not need to know the model but data (is loosely coupled), etc.
@msporny
There is no need to lock
@context
set into a signature, it's actually redundant information in order to verify the data integrity.
I do not see how it is redundant, since it is only through the context that you obtain the URIs from the short form property names. So it is essential. If you allow the context to be squishy and modifiable, then the entity that verified the signature (wallet or verifier) really does not know what short form property names to display to the end user. Consequently it should throw them all away and only use the context mappings that it already knows about and trusts i.e. a local context, which may or may not actually be present in the VC. And if it does not have a local mapping, then it should either display the URI to the end user, or try to look up the URI and see if its definition contains a mapping it can use (assuming property definitions are machine processable).
(edited to put backticks around @words
)
I do not see how it is redundant, since it is only through the context that you obtain the URIs from the short form property names. So it is essential. If you allow the context to be squishy and modifiable, then the entity that verified the signature (wallet or verifier) really does not know what short form property names to display to the end user.
100% agree with this.
Something I've noticed in this thread is that much of the conversation appears to be dominated by wanting to give the relying party / verifier flexibility to do exotic JSON-LD things, like reframing documents with contexts other than what it was signed with. This is quite frankly a recipe for disaster and as the original contents of this issue demonstrates it creates very real vulnerabilities. In short I personally don't buy the utility of any of the outlined use cases for leaving @context
open to manipulation or that in doing this that the utility obtained justifies the security issues created.
If instead one is to consider what an issuer typically wants from a credential securing format. One of the critical features they want is to ensure their credentials are tamper evident. The problem with leaving the @context
of a VC insecure is that it is in direct contradiction of this desire or at the very least requires issuers to have far more trust in the relying party and verifiers to suitably validate their credentials.
The tamper-evident property is critical. That's the whole point of digital signatures.
The point is that the contents of each context need to be known by the issuer (in order to hash them and generate the proof) and by the verifier (in order to verify that the contexts have not changed from when the issuer used them)... and if each party knows that information, then they have to know about each context and its contents (either by value or by cryptographic hash)... and if you know that information, you can verify the signature (and it'll either work if nothing the verifier is depending on has changed, or it'll fail if the contexts don't line up for the information that has been protected, which is what matters).
Did that answer your question, @awoie?
@msporny My assumption on trust model is that the verifier trusts the issuer. They shouldn't be required to vet the contents of context files themselves. If they are not required to do that, you could include these context digests in the proof and the distribution of context files and URIs becomes much simpler. I don't think the verifier has to be forced to understand the specifics of the context definitions (except for the compacted result) the issuer included in a VC to compute the DI proof.
The point is that the contents of each context need to be known by the issuer (in order to hash them and generate the proof) and by the verifier (in order to verify that the contexts have not changed from when the issuer used them)... and if each party knows that information, then they have to know about each context and its contents (either by value or by cryptographic hash)... and if you know that information, you can verify the signature (and it'll either work if nothing the verifier is depending on has changed, or it'll fail if the contexts don't line up for the information that has been protected, which is what matters). Did that answer your question, @awoie?
@msporny My assumption on trust model is that the verifier trusts the issuer. They shouldn't be required to vet the contents of context files themselves. If they are not required to do that, you could include these context digests in the proof and the distribution of context files and URIs becomes much simpler. I don't think the verifier has to be forced to understand the specifics of the context definitions (except for the compacted result) the issuer included in a VC to compute the DI proof.
By trusting the issuer, I mean trusting that the integrity of the document that is secured using a DI proof is guaranteed, i.e., a VC for example is tamper-proof. EDIT: and this regardless of what permutation from my previous post was used from a verifier perspective.
Something I've noticed in this thread is that much of the conversation appears to be dominated by wanting to give the relying party / verifier flexibility to do exotic JSON-LD things
JSON-LD expansion or compaction is not an exotic thing but core JSON-LD algorithms.
In short I personally don't buy the utility of any of the outlined use cases for leaving
@context
open to manipulation or that in doing this that the utility obtained justifies the security issues created.
There are no security issues except developers who do not understand how it works, or don't want to.
As an independent open-source developer and an implementer of JSON-LD and VC processors. I'm happy to offer paid consultation services to anyone who needs to get aboard with those new technologies and concepts.
Please don't hesitate to contact me.
As an independent open-source developer and implementer of JSON-LD and VC processor. I'm happy to offer paid consultation services to anyone who needs to get aboard with those new technologies and concepts.
@filip26 I think its very telling that after all that has been said in this thread that you think the issue here lies with the understanding of those contributing to this conversation.
As a member of this community for several years who has contributed in various ways, I believe I have a pretty good understanding of DI/LDP, JSON-LD and related technologies. I have also spent years developing and deploying products that leverage this technology, so I've also seen what the layman developer understands. The same goes for everyone you tagged in your comment, they are all respected experts in this space so please don't attempt to patronise them/us by implying you have superior knowledge.
The purpose of standards isn't to just cater for those authoring the standard or those directly involved in the working group, its to cater for the community yet to be formed around the technology. That is what this issue is about, making sure this technology is safely use-able and implementable by others who didn't have the advantage of participating in the WG.
@tplooker,
I have mentioned a number of times in this thread that an application must be natively coded against a particular context (it must "understand" a context) to be able to consume the terms it defines. I have written about how, even if the context does not change, this is still true -- and how applications will fail if they do not. Can you explain how your applications are going to function properly without doing this, without any assumption of control over the decisions others make in the ecosystem (IOW, using only the self-describing documents / data themselves)?
I have also come to a similar conclusion (without any detailed knowledge of JSON-LD), i.e. that an application must be configured with the contexts that it trusts, and must not use any other ones - otherwise it will display unknown/fake/ property names to the user.
@David-Chadwick,
I have also come to a similar conclusion (without any detailed knowledge of JSON-LD), i.e. that an application must be configured with the contexts that it trusts, and must not use any other ones - otherwise it will display unknown/fake/ property names to the user.
Yes, it must do this, no matter what happens with any securing mechanism. Showing JSON keys to users is terrible UX anyway. A much better way forward are "render methods".
We should not design any core primitives around having a UX that involves showing users JSON keys. Instead, community work should continue on render methods, making it easier for applications to express information in helpful ways to users. In the meantime, production applications should not show information to users (without clear warning signs) when those applications do not know what they are showing to the users.
@David-Chadwick wrote:
I have also come to a similar conclusion (without any detailed knowledge of JSON-LD), i.e. that an application must be configured with the contexts that it trusts, and must not use any other ones
I'll note that this is a basic secure information modelling and consumption requirement, the details are specific to JSON-LD's @context
value, but the general requirement applies to ANY secure information ecosystem. IOW, this isn't just about JSON-LD, it applies to any 3 party digital credential system.
Whether it is applying hard-coded checks, a JSON Schema, or ensuring that the @context
values are what one expects to incoming data, you cannot (as the code in the initial disclosure does) just take input and processes it without knowing what it means.
@awoie wrote:
By trusting the issuer, I mean trusting that the integrity of the document that is secured using a DI proof is guaranteed, i.e., a VC for example is tamper-proof. EDIT: and this regardless of what permutation from my previous post was used from a verifier perspective.
The /integrity of the information/ (the signed N-Quads statements) is guaranteed, but that doesn't mean the verifier can just accept any input and process the document without knowing the context. This applies for every variation that you provided, @awoie, so the answer is the same for all of the variations, AFAICT. As @David-Chadwick says above, "an application must be configured with the contexts that it trusts".
I know that it might feel like we're starting to go around in circles on this thread, and that is understandable. There is quite a bit of talking past each other that's going on, and that's usually a sign that there are probably fundamental philosophical differences on which layer of the architecture is responsible for what and how all the layers come together into a secure solution.
In an attempt to summarize (I'm sure I'll fail and/or miss important points), these sound like the positions so far:
On the one hand, if we cryptographically hash the contexts and mix them in with the signature, then we lock in what the issuer used to create the signature and the verifier has to use the same information to verify. At that point, the verifier knows that they're dealing with information that the issuer intended. This check happens at the verification layer.
On the other hand, if a verifier ensures that all incoming contexts have been vetted and are trusted then that stops the attack that's demonstrated in this issue. This check can happen before verification (during input processing), or it happens at the validation layer.
Both approaches mitigate the specific attack presented, but in different ways, and with different downstream consequences to developers or the ecosystem (or both). That is not to say that they are equivalent, and that's where a lot of the discussion seems to be going now, which is to discuss the ramifications of each approach.
Checking that the particular expression of information hasn't changed is insufficient to build a robust application. The application MUST validate that the expression of any data matches expectations before consuming the information it holds. This check alone will solve the problem of consuming terms defined by an untrusted context, because it will not be allowed. It is never allowed -- whether the document is secured or not and regardless of any particular securing mechanism.
Adding a securing mechanism that prevents changing from one unknown expression to a different unknown expression does not change this calculus. If you don't know what the expression means, you can't consume the data as expressed. Your options are to reject the data as expressed or to transform it to an expression your application does natively understand.
To reiterate this point, we can imagine a scenario where we're passing payloads around using envelope-based security, such as JWTs. The payloads in this example are expressed as JSON documents, specifically not JSON-LD. Once we have ensured that the envelope hasn't changed in transit and we obtain its (unchanged) payload, it would be a mistake not to perform validation checks (e.g., run JSON schema) on that payload before trying to consume the information it contains. This is an application-level validation check, it is not something that a JWT library can do for you.
I can't emphasize enough that these checks need to be performed, no matter what the formats or securing mechanisms in play are. Even if you have a closed-world, two-party system with a tightly coupled single verifier and single issuer, a robust application should still perform these checks. Despite this being true, developers may skip these checks and instead assume them away based on the notion that they checked who the issuer was and that "the issuer will only do what I was expecting".
But this is just an assumption, not a direct fix to the underlying problem. Yes, relying on it as a half-measure may allow you to escape bumping into the problem in systems where you have some measure of control over everything that is happening. But in the three-party model, every actor is independent and makes decentralized, asynchronous decisions without your permission. The way to avoid semantic ambiguity here is with self-describing data that carries globally unambiguous context -- and to check that the data is expressed in a way your application understands so that you can consume its information with confidence.
Putting gatekeepers in the way of transforming protected information to other expressions just to enable a half-measure that doesn't work in the open world, three-party model is a mistake.
I have mentioned a number of times in this thread that an application must be natively coded against a particular context (it must "understand" a context) to be able to consume the terms it defines. I have written about how, even if the context does not change, this is still true -- and how applications will fail if they do not. Can you explain how your applications are going to function properly without doing this, without any assumption of control over the decisions others make in the ecosystem (IOW, using only the self-describing documents / data themselves)?
For many simple VC's this might hold @dlongley, however this isn't always the case and designing a system that assumes this means the system simple wont scale, because in effect by doing this you have made all information in all VC's critical to process by verifiers, at least the contexts anyway which is a painful constraint to put upon them.
I have provided a concrete use case here to highlight this, of which I might add there are many variants of, which doesn't appear to have been responded to.
At this point in the thread I feel we are just talking in circles. What you perceive as a feature @msporny @dlongley et al, I see as a major security flaw and a design issue that limits the ability for DI systems to scale. I've done my best to articulate the vulnerabilities, provide evidence of it in deployed systems, usecases that suffer in the current design and proposed solutions I think fix this. I'm not sure what else I can add at this point and like many others in this thread I'm sure are too, I'm struggling to find the bandwidth to continue to consume long form responses and provide replies.
In an attempt to summarize (I'm sure I'll fail and/or miss important points), these sound like the positions so far:
On the one hand, if we cryptographically hash the contexts and mix them in with the signature, then we lock in what the issuer used to create the signature and the verifier has to use the same information to verify. At that point, the verifier knows that they're dealing with information that the issuer intended. This check happens at the verification layer.
On the other hand, if a verifier ensures that all incoming contexts have been vetted and are trusted then that stops the attack that's demonstrated in this issue. This check can happen before verification (during input processing), or it happens at the validation layer.
Maybe not the right place, but I try to ground this a bit. Which one is the current, correct method?
Is the one at https://www.w3.org/TR/vc-data-integrity/#algorithms the correct one? How do the hashes at https://www.w3.org/TR/vc-data-integrity/#contexts-and-vocabularies factor in?
I was actually trying to find clear examples and came today across https://grotto-networking.com/blog/posts/jsonldProofs.html and this looks to be what I implemented in .NET. Would this be correct?
Maybe this highlights some issues what a relative outsider implementer also other than I could try to do. In my case I went through some documents, took an example document and tried to implement some examples as "first pass taster". The danger being that if I think I understood and get the examples pass and sometime later even the conformance tests, I may not have understood it well enough. It might be some things become clear in a test suite (especially if some "negative" examples are added) or not.
I'll be using https://github.com/dotnetrdf/dotnetrdf and need to map the procedure with errors and so on to what it does and how it works so that in the library one could just pass key material, the document and a callback for other verifications there. I had some time ago already actually implemented such application level pieces with regulated industries and other verification methods in mind. But that comes after I've assured myself I understand e.g. the sequence verifying a proof like in that Grotto blog post -- and creating one.
The issue was discussed in a meeting on 2024-07-03
@tplooker,
Regarding your question around the "branding" feature you mentioned having trouble with:
TL;DR: The ideal and most scalable solution to displaying VCs is "render methods". I recommend working with the community to help develop these. Anything else is a stop gap measure and should not drive the core design of VC primitives. A variety of these stop gap measures are available today, some being better than others depending on the situation. But don't use a catch-all "issuer-dependent" @vocab
, because it doesn't play nicely with others, creates confusion, and incentivizes behavior that leads to more trouble in the ecosystem.
At MATTR several years ago we decided to extend the VC 1.0 data model to include our own early attempt at credential branding. This involved us defining our own company based
@context
value to extend the base data model with these terms. Then every credential we issued had to include this context value so that branding terms we defined became defined. As a consequence, because we wanted to limit our document loader from resolving contexts over network. We had several significant deployment issues where some downstream applications of ours didn't have the required@context
entries resolved, meaning credentials failed verification until we did a redeploy. The pain was, these@context
values defined terms that weren't even being processed by the wallet and verification software as the software didn't understand the branding by design! It just simply wanted to ignore this portion of the VC's and couldn't without being redeployed with a redundant@context
value. Then when we updated this context multiple times over the deployment we had to pre-distribute the new@context
values into the wallet and verification apps and wait for it to propagate before we could safely issue new VC's using the new context values. This required heavy co-ordination that was only possible because we were the issuer, wallet and verifier software, it simply wouldn't have been possible in a scaled and open ecosystem.
My understanding of the requirements you had for the above were this:
Hopefully, I've understood these requirements correctly.
If requirement number 1 isn't actually a requirement for you and never will be, then I'd quickly say you may be better off using another technology, because VCs are designed for that requirement.
Many of the extra things you might not need to do with some point-to-point integration or closed world system come from a three-party model design that respects and recognizes limited control in a decentralized ecosystem with players that may not be known by reputation. A significant portion of this design makes use of self-describing data that is globally unambiguous. Expecting others to accept your own private expressions of data or relying on "who sent the data" to make determinations about its format are not good ideas in this space. Again, that does mean you might have to do some more work upfront; there is no free lunch.
For a number of applications in this space, particularly general purpose digital wallets, whoever sent (or authored) some data is not always known by reputation. In this sense, general purpose digital wallets have similarities to browsers, which also do not generally know Web origins by reputation. The Web could not scale if every origin had a different data format to express its webpages -- and if browsers were required to understand each of these. A likely outcome would be a handful of large origins that worked with everyone else being left out. Instead, general purpose mechanisms have been collaboratively created over time to allow browsers to display any website that uses them. This approach is clearly scalable, but again, requires more work upfront.
Similarly, requiring each general purpose digital wallet to understand a wholly unique and different format for every VC does not scale. I think we agree on this point. However, I think that not only does this scaling problem remain when changing this to (approximately) a unique format for every issuer, but this approach adds back the power dynamics and centralizing market forces VCs are designed to help avoid. To be clear, my perspective is that this is what using issuer-dependent, potentially conflicting "branding" terms (via @vocab
) would do.
A major design aim in the three-party model is to decouple vocabularies from issuers so that they are independent from and widely reusable across issuers, avoiding recreating situations where only a small number of issuers can reasonably exist to produce VCs that work with general purpose digital wallets.
So, to address the rendering problem in a scalable way, that avoids these problems, "render methods" are being worked on in the community. These are globally unambiguous, issuer independent, and reusable mechanisms for rendering VCs. They are being designed such that issuers can include some number of render methods in their VCs so that general purpose digital wallets or other verifier software can render the information in the VCs with confidence -- in a way that the issuer recommends.
Of course, while not everything happens as quickly as we'd like, we should not have a suboptimal current situation drive our core design decisions. So what can we do while render methods are still being incubated?
First, we need to start with understanding the constraints are in the VC ecosystem. These are the things we do not control.
In a global, three-party-model setting, there is no expectation that verifiers will accept your VCs. They can reject them for any reason. Therefore, if maximal acceptance is a goal, focus needs to be on minimizing information in VCs that is not useful to verifiers and providing information that is of value. This includes maximizing a verifier's understanding of the information that is present, minimizing ambiguity.
Every verifier MUST understand the contexts that define the terms that they consume. This is a hard requirement for consuming any document as explained in other comments in this issue. In general, the more terms there are that are not of interest to a verifier or that cannot be globally disambiguated, the more likely a verifier is to reject a VC.
Now, some verifiers will be willing to load contexts from the Web and others will not. This dividing line immediately speaks to how open and interoperable a verifier is. Verifiers that are unwilling to load contexts from the Web will necessarily be less interoperable than those that are. This is ok, because both subsets can exist and sometimes intermediaries can bridge the gap between them -- if there are technological solutions that enable it to happen, such as JSON-LD and Data Integrity proofs.
Looking at verifiers that are willing to load contexts from the Web, these verifiers can read @protected
terms from the contexts their applications are natively coded against without running JSON-LD compaction. If @protected
terms are not used, they also have the option to perform JSON-LD compaction on VCs to ensure the entire expression is in a context their application natively understands. All of these verifiers might be willing to accept VCs that carry a MATTR-specific (or any vendor-specific) context.
This only leaves verifiers that are not willing to load contexts from the Web, an intentionally more inflexible subset of verifiers. This includes some of your own applications, as stated.
However, there are still several ways to interoperate with verifiers in this category, when one has a document containing a context they do not understand:
"additionalProperties": false
).There should be considerable scalability in a number of directions given all of the above. But, I do think that "render methods" will provide the most ideal solution in the future.
What I would strongly recommend against doing is using a catch-all, issuer-dependent @vocab
to express "branding" terms that you will interpret in a vendor-specific way (you do not get to decide how others will interpret these). Doing this can lead to:
+1 for render methods as a means to brand a credential. BC Gov has successfully demonstrated this by applying OCA bundles to anoncreds VC's as a branding overlay. We are now working on a OverlayCaptureBundle renderMethod
in partnership with the Swiss government for json-ld VC's.
Here's an implementation in the Open Wallet Foundation bifold project Here's an explorer to see it in action
The following issue outlines two significant security vulnerabilities in data integrity.
For convenience in reviewing the below content here is a google slides version outlining the same information.
At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.
In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.
The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the
@protected
keyword doesn't cover terms that are defined using the@vocab
and@base
keywords. This means any terms defined using@vocab
and@base
are vulnerable to term redefinition.The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the
@context
element of the JSON-LD document. The fact that the@context
element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.Please see the attached presentation for resolutions to this issue we have explored.
In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the
@context
element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.I'm also happy to present this analysis to the WG if required.
(edited to put backticks around
@words
)