Closed jyasskin closed 1 week ago
If one controller document creates a "verification relationship" to "https://actor1.example/key1", can a hostile actor include a verification method in their controller document with "id": "https://actor1.example/key1" and cause their key to be trusted? https://www.w3.org/TR/2024/WD-controller-document-20240817/#retrieve-verification-method does say to fetch every verification method URL with no caching at all, but it seems unlikely that implementations will actually omit all caching.
No. The verification methods are always resolved using their IDs, not by happening to know a verification method is (supposedly) controlled by a particular controller and going to its controller document (perhaps in a cache) to find the verification method. This seems convoluted; one starts with the verification method ID (e.g., as expressed in a proof.verificationMethod
or kid
field). So a cache-based attack doesn't seem to make any sense here.
Notably, the text already says: "The following algorithm specifies how to safely retrieve a verification method..." -- and one would hope that anyone building a cache would only do so by safely retrieving a verification method first, not by doing it in an unsafe way and creating cache entries that could then allow avoiding safe retrieval.
I suppose we could add a note that says (something like): "Don't build a cache by fetching controller documents, walking each one, and adding reverse entries from any verification method IDs found therein back to the controller document, as this is not a secure way to retrieve verification methods. Caches have to be built based on originally safe verification method retrieval processes or else they could allow unsafe retrieval."
Some of us are concerned about the inclusion of multihash and multibase. We all think it's best to mandate that all implementations of this specification align on a single cryptographic digest algorithm and a single base encoding, to improve interoperability. We're split on whether it's a good idea to use the multihash and multibase formats to make those strings self-describing.
What choice do we have here? Certainly not to start designing something new from scratch. A well known and well established tech must be picked. The only question is, should the tech allow extensibility or not?
If yes, then there is no better choice than Multibase, Multihash, it's well established and well supported by all major and minor languages, and I'm happy to learn about better alternatives.
If not, then you have to mandate one digest algorithm and one base encoding, but in long term this choice, a choice you made on behalf of end users, won't be respected, and would be see as limiting, and therefore implementers will start inventing their own solution to meet customer requirements, and there won't be any interoperability at all.
Generally, I see a lot concerns about implementers do not respecting this and this, and therefore making implications that such a thing is an issue, on many w3c forums. The truth is that as an implementer I must deliver what a customer requires, not otherwise. This concern is void. One could be easily sued for delivering a verifier that does not work as expected.
Extensibility prohibition is the sure way to make implementers to stop respecting a spec.
(This is me discussing the issues; I'll go back and check for TAG consensus once things settle down.)
@dlongley Re https://github.com/w3c/controller-document/issues/94#issuecomment-2332584962, why is it the right design for the verification methods to name themselves using absolute URLs instead of something that's explicitly scoped to the containing document? I see why it happened in a design that started as RDF -- RDF has no local names, just blank nodes and universal names -- but it seems vulnerable to bugs.
@filip26 Re https://github.com/w3c/controller-document/issues/94#issuecomment-2334588324, the advice I've gotten from security experts is to "have one joint and keep it well oiled". The most obvious candidate for that joint in controller documents is the verification method's type, which means that type should determine everything else about the cryptographic system, from the signature algorithm to the hash to the binary->ascii encoding. If customer requirements change, you standardize a new value for that type field. You don't try to switch from base64 to base58 in place. This does imply that JWKs are also a mistake, but that's a lost battle while multihash is not. I think it's acceptable for a type to say "you must use base64, and encode it with multibase's initial 'u
'" or "you must use sha2-256, and encode it with multihash's initial 0x12", while others on the TAG would prefer to omit those initial bytes.
The most obvious candidate for that joint in controller documents is the verification method's type, which means that type should determine everything else about the cryptographic system,
Please, can you explain how multihash, multibase, or any other self-describing, well-documented/adopted format prevents you from doing so?
I think it's acceptable for a type to say "you must use base64, and encode it with multibase's initial 'u'" or "you must use sha2-256, and encode it with multihash's initial 0x12", while others on the TAG would prefer to omit those initial bytes.
It's not, and you say it is, ... argumentation based on subjective feelings ends up like this. Please, let's go back to the first question I've asked.
(We recognize that the DID WG sees registries as too-centralized, but we disagree.)
Please, can some elaborate on why "we disagree"? Is it based on something?
I'm responding in my capacity as an Editor and not on behalf of the VCWG. We will try to review this response during W3C TPAC to see if the WG has consensus wrt. the suggestions below.
@jyasskin wrote:
Sorry that this took so long.
No problem, thank you for the thorough review. :)
We appreciate this effort to make the bag-of-keys functionality that Verifiable Credentials use more independent from the did: URL scheme. Beyond that, we're not confident that other systems will find much use in it, since the effort of profiling it is likely to be larger than the effort in defining a bespoke format.
The primary reasons the document exists is because 1) the VC JOSE COSE specification authors did not want to create normative references to DIDs or Data Integrity, and 2) a few implementers wanted to generalize DID Documents to allow for any URL in all places instead of just DID URLs. IOW, we are here because this was the compromise that achieved consensus in the VCWG. As an Editor, I agree that profiling is a non-trivial effort. That said, the DID WG has agreed to profile the Controller Document and build DID Core v1.1 on top of it, the VC JOSE COSE Editors have agreed to the same, and there is growing demonstration that the ActivityPub community is doing things that look very close to what a Controller Document is (we're engaging with that community as well). We have three communities profiling so far, and that's better than three bespoke formats that do the same thing.
There is also a risk that defining a generic format will introduce security vulnerabilities into specific applications when libraries implement the generic format and fail to enforce the restrictions that those specific applications need. We've seen this in the past when generic JWT libraries allowed alg=none or symmetric keys in applications that were designed for asymmetric keys. While those specific flaws don't exist here, analogous ones might.
Yes, that is always a danger when you generalize a security format. That said, we know of no vulnerabilities now in the specifications that plan to profile the Controller Document and are monitoring how profiles are created in order to mitigate the concern raised above.
We were happy to see that this document doesn't try to define a format that can be interpreted as JSON and JSON-LD at the same time. Some of the discussion in issues has been worrying on that front — it sounds like some implementers might be intending to include
@context
properties, parse documents as JSON-LD using hash-pinned values for those@context
URLs (which is better than not pinning them), and then interpret the result using an unspecified (though logical) mapping from URLs to the terms that this specification defines. We are concerned about such an implicit interoperability requirement that isn't captured in the format's specification, and we're concerned that attackers will find ways to exploit the complexity of JSON-LD context processing.
It sounds like the TAG would like more language in the specification on how to safely process a DID Document, but it's not clear what language would address the concern. Could you provide a few rough sentences on what the TAG would like the specification to say?
We're also skeptical that JSON-LD provides benefits for a format designed for grouping cryptographic keys: interoperable extensibility can be achieved through IANA registries at least as well as through individually-owned URL prefixes. (We recognize that the DID WG sees registries as too-centralized, but we disagree.)
Removing JSON-LD support and using a centralized registry would lead to objections within the Working Group. What we have right now is where we have achieved consensus.
Some of us are concerned about the inclusion of multihash and multibase. We all think it's best to mandate that all implementations of this specification align on a single cryptographic digest algorithm and a single base encoding, to improve interoperability.
Hmm, selecting a single hash and digest algorithm has been discussed over the years and rejected for a variety of reasons (lack of algorithm agility, how do you pick the "right" algorithm, conflicting customer requirements across industries, there are legitimate needs for different base encodings, etc.).
For example, some implementers want to use SHA2-256 while others want to use SHA2-384 to perform cryptographic strength matching. Some government customers are pushing upgrading to SHA3. While some implementers don't see that as necessary, others claim their customers require it. Similarly, the base-encoding mechanisms used in the VCWG find the encodings used in a variety of scenarios where picking one base-encoding format would create deployment issues. For example, base64url is an acceptable choice for base-encoding a value into a URL, but a poor choice when base-encoding a value into a QR Code (which is optimized for base45). Choosing one hash and one encoding mechanism has demonstrated to not be workable for the diversity of use cases that the Working Group is addressing.
That said, where the WG can pick one base encoding mechanism, such as with Data Integrity cryptosuites, it does that. Just because Multibase allows any base encoding does not mean we maximize the flexibility. For example, the ECDSA and EdDSA cryptosuites use base58 encoding only. Similarly, we only specify four multihash values, which are the ones we've heard are required based on customer demand.
We're split on whether it's a good idea to use the multihash and multibase formats to make those strings self-describing.
We have many implementations already of both formats in the wild, with multiple VCWG implementations having committed to the format.
- It seems risky, at least in some cases, to say "https://some.url/ defines the keys that can approve changes to this document" without pinning the content of https://some.url/ either by hash or signature, and we don't see any facility in this specification to do that pinning. Where would that be defined?
The design of the controller
property allows for key rotation at https://some.url/
to ensure that if there is a potential key compromise there, that a key rotation could fix the issue without the controller document having to be updated. So, always pinning doesn't work in some use cases. That said, pinning could happen via the use of the digestMultibase
property defined in https://www.w3.org/TR/vc-data-integrity/#resource-integrity . We can discuss this item in the group and see if individuals feel strongly about such a mechanism.
- If one controller document creates a "verification relationship" to "https://actor1.example/key1", can a hostile actor include a verification method in their controller document with "id": "https://actor1.example/key1" and cause their key to be trusted? https://www.w3.org/TR/2024/WD-controller-document-20240817/#retrieve-verification-method does say to fetch every verification method URL with no caching at all, but it seems unlikely that implementations will actually omit all caching.
@dlongley provided a preliminary answer here: https://github.com/w3c/controller-document/issues/94#issuecomment-2332584962
We will discuss this item in the group in more detail at W3C TPAC 2024.
From @jyasskin
(We recognize that the DID WG sees registries as too-centralized, but we disagree.)
Respectfully, this problem cannot be solved with registries.
Not because we haven't figured out a way to do that yet, but because the point of the work is to solve identifier management without a registry.
To wit, the web (including IP addresses and DNS) already embodies a wonderfully, operationally decentralized system. Unfortunately, that system is not decentralized from an authority perspective: to be interoperable with the public web relies on a centralized list of known root authorities. This work, decentralized identifiers, exists to solve that problem: how do create a global identifier space that is NOT anchored to a centralized list of necessarily trusted authorities.
The goal of the web has always been to democratize access to information. Moving beyond a centralized registry that gets to--however indirectly--decide who gets to participate as a first class peer in the network, is the point of the work.
We have always seen DIDs as a continuation of the fundamental goals of TBL at the inception of the Web itself. Indeed, DIDs provide the most promising opportunity to connect the legacy web with Web3. Requiring that DIDs use a centralized registry would bring that opportunity for interoperability and integration to a halt.
The issue was discussed in a meeting on 2024-09-27
@jyasskin and @hadleybeeman, thank you for engaging on behalf of the W3C TAG at the recent W3C Technical Plenary meeting in Anaheim. The transcript of that meeting can be found in the Github comment above this one. This comment is meant to summarize the changes we intend to make to the specification based on our conversations during W3C TPAC. Please let us know if you would like us to do more than what we propose below:
controller
that is authorized to change the contents of the document and how not including a cryptographic digest for the remote resource is both good (allowing for key rotation) and risky (if a cryptographic digest is not used).
We'll link the PRs to each checkbox as they are raised.
Add normative text stating that expressing a key in a controller document that does not match the base URL for the document MUST throw an error.
I think there's some nuance here.
I agree that a controller document should not be able to set a verification method for an identifier other than the primary, singular "id" property--the document is canonical for just one identifier--but a controller SHOULD be able to set a verification method that uses an externally defined key because that is a reasonable policy decision for managing keys.
However, there doesn't seem to be a way in the controller document to actually specify a verification method for an identifier that isn't the "id" of the base document.
So, either I'm in disagreement (because external keys are a reasonable policy choice) or it seems to functionally be addressed (but could use better explanatory language).
However, there doesn't seem to be a way in the controller document to actually specify a verification method for an identifier that isn't the "id" of the base document.
I agree with @jandrieu that there is some nuance here. I think any change here would ideally not eliminate use cases where "external VMs" can be referenced from controller documents.
I think this can be done by requiring any externally referenced VM to be done only by reference and not by embedding, i.e., only the ID (a URL) of the verification method can be expressed in an external ("non-canonical" in Joe's parlance here) controller document. However, expressing an embedded VM that has an id
with a base URL that does not match the controller document ID must result in an error (IMO, this is "the nuance"). This would always force retrieval of this verification method to be done through the usual safe VM retrieval algorithm.
The spec should also more clearly state that any VM retrieval can only be safely performed through the VM retrieval algorithm (or an equivalent algorithm, as always).
With this in mind, safely retrieving verification methods always means the starting point is a verification method ID, which is necessarily rooted in its expected "canonical" controller document URL, and an expected verification relationship (which can default to verificationMethod
).
If the algorithm looks like this (which is already very close to what is in the spec):
#
URL) of a primary resource, identified by its base URL, the expected controller document ID.id
value matches the controller document ID URL, any other data model checks, etc).id
value matches the verification method ID, any other data model checks, etc.).controller
property that lists the controller document ID.Then using this algorithm to retrieve VMs will always result in finding the "canonical" controller document for the verification method, which is also the only acceptable place to express it in full (not just by reference).
Now, this does not preclude another controller document from referencing external VMs. But note that verification retrieval algorithms won't start at that controller document, however -- they will start with a verification method ID as mentioned, eliminating any possible influence.
For a use case for this, consider a VC with an issuer with an ID value of did:a
. This VC has a proof in it that is verifiable using a verification method with an ID of did:b#vm
. To verify this proof, did:b#vm
must be retrieved using the above algorithm, which will necessarily obtain (and validate) the VM information from the controller document did:b
. This will allow the proof to be checked. Then business rules are run to check whether use of this verification method is acceptable for issuer did:a
. This is a separate check -- and involves retrieving DID document, did:a
, and seeing if it references did:b#vm
as one of its assertionMethod
verification methods.
The issue was discussed in a meeting on 2024-10-09
All PRs related to this issue have been created, reviewed, and merged. Closing.
Sorry that this took so long. I'm pasting the comment the TAG agreed on this week, which is also in https://github.com/w3ctag/design-reviews/issues/960#issuecomment-2330123526:
We appreciate this effort to make the bag-of-keys functionality that Verifiable Credentials use more independent from the did: URL scheme. Beyond that, we're not confident that other systems will find much use in it, since the effort of profiling it is likely to be larger than the effort in defining a bespoke format. There is also a risk that defining a generic format will introduce security vulnerabilities into specific applications when libraries implement the generic format and fail to enforce the restrictions that those specific applications need. We've seen this in the past when generic JWT libraries allowed alg=none or symmetric keys in applications that were designed for asymmetric keys. While those specific flaws don't exist here, analogous ones might.
We were happy to see that this document doesn't try to define a format that can be interpreted as JSON and JSON-LD at the same time. Some of the discussion in issues has been worrying on that front — it sounds like some implementers might be intending to include
@context
properties, parse documents as JSON-LD using hash-pinned values for those@context
URLs (which is better than not pinning them), and then interpret the result using an unspecified (though logical) mapping from URLs to the terms that this specification defines. We are concerned about such an implicit interoperability requirement that isn't captured in the format's specification, and we're concerned that attackers will find ways to exploit the complexity of JSON-LD context processing. We're also skeptical that JSON-LD provides benefits for a format designed for grouping cryptographic keys: interoperable extensibility can be achieved through IANA registries at least as well as through individually-owned URL prefixes. (We recognize that the DID WG sees registries as too-centralized, but we disagree.)Some of us are concerned about the inclusion of multihash and multibase. We all think it's best to mandate that all implementations of this specification align on a single cryptographic digest algorithm and a single base encoding, to improve interoperability. We're split on whether it's a good idea to use the multihash and multibase formats to make those strings self-describing.
We don't see some security considerations that we were expecting to see:
It seems risky, at least in some cases, to say "https://some.url/ defines the keys that can approve changes to this document" without pinning the content of https://some.url/ either by hash or signature, and we don't see any facility in this specification to do that pinning. Where would that be defined?
If one controller document creates a "verification relationship" to "https://actor1.example/key1", can a hostile actor include a verification method in their controller document with "id": "https://actor1.example/key1" and cause their key to be trusted? https://www.w3.org/TR/2024/WD-controller-document-20240817/#retrieve-verification-method does say to fetch every verification method URL with no caching at all, but it seems unlikely that implementations will actually omit all caching.