Can we nuance our mental model on DID control slightly?

w3c / did-core

W3C Decentralized Identifier Specification v1.0

https://www.w3.org/TR/did-core/

Other

410 stars 97 forks source link

Can we nuance our mental model on DID control slightly? #233

Closed dhh1128 closed 4 years ago

dhh1128 commented 4 years ago

PR #213 has generated an interesting comment stream, and I think some useful clarity. I am happy to have multiple smart people agree in writing to the concept that a DID can identify anything, because this flexibility seemed to have been excluded by some verbiage I was hearing.

Now I'd like to explore a subtlety around the concept of control. I will frame this in terms of a use case that I'm familiar with in cybersecurity and malware research, but I think you'll quickly see how it might apply to use cases brought up by others.

Malware researchers typically identify malware (viruses, worms, infected or malicious files) by a sha256 hash. The first time a particular sample is seen in the wild, a researcher hashes the sample and goes to virustotal.com or some similar site to see if anybody else has seen it before. If no, the sample is uploaded to the site's DB for all the world to look at. If it is already known, then the researcher has just made a second (or a third, or a tenth) independent discovery.

Now, suppose I wrote a DID method that was all about identifying malware with DIDs. The logical identifier format would be did:mymethod:hash-of-sample. With me so far?

Okay, now what are the control semantics?

What I have heard so far is that DIDs are always created by a controller, who can then (even in the genesis DID doc) choose to retain control or give it away (e.g., by specifying no control after the creation transaction). This makes sense for many situations.

However, that doesn't quite fit this scenario, because A) the researcher who reports the malware is never, at any time, in a "control" relationship with the sample's identifier, and would not want to be considered so; B) the identifier cannot have control semantics, even at its genesis transaction, because its derivation mechanism disallows it; C) the identifier doesn't have a DID doc. What's being identified here is content that exists, that is explicitly uncontrollable to begin with. Anybody who discovers the content will discover the same identifier. Two researchers could register the same content on two different systems of record and both would be equally valid and not in conflict.

So my question is this:

Would we be comfortable saying that DIDs can be used to identify such things, too? And if yes (which I hope is an easy answer), are we willing to not describe such a scenario as "the controller creates the DID" but rather "the DID identifies something inherently uncontrollable, so it never has a controller, even during creation; rather, it has a discoverer" (or something to that effect)?

dhh1128 commented 4 years ago

Tagging @ewelton and @mitfik and @swcurran

ewelton commented 4 years ago

This would impact issue #122 as well - in fact, it would force the selection of 1b and 2a.

jandrieu commented 4 years ago

There is always a controller. So 1b doesn't work.

At the bare minimum, whoever created the DID is the controller. It does not imply they are, in any way, a controller of the DID Subject. All it means is that they controlled the initial DID Document, and presumably--depending on the method--retain the ability to further modify the document.

In the malware use case, I believe a better way to model that is that the initial reporter generates a DID and issues a credential saying that malware with a given hash has been given such and such a DID, perhaps with other corroborating claims in that credential. No control relationship needs to be established between the DID Controller and the DID Subject. But no matter what the relationship between Subject and Controller are, there is ALWAYS a Controller, whether or not there is a controller property.

Alternatively, the malware discoverer could just issue a credential with that hash, without using DIDs. I'm not sure DIDs buy the use case much.

dhh1128 commented 4 years ago

@jandrieu : Let me challenge "there is always a controller" a slightly different way. If a DID doc is created (said differently, if any metadata is associated with a DID), then I agree that there is always a controller, at least at the outset. Whoever creates the DID doc (chooses the metadata) is the controller. They can make decisions about whether to continue being the controller, or to disclaim control over the content by removing control methods. Furthermore, I agree that all our thinking up until now has assumed that DIDs and DID docs are inseparable concepts, because <assumption>of course we want metadata</assumption>.

But what I'm suggesting is a scenario where an identifier needs to be created (perhaps better said, it gets discovered) without a DID doc--zero metadata--from the get-go. Nobody is allowed to define any metadata about the identifier -- even at the outset. The need here is a pure identifier that has the decentralization characteristic of DIDs, but not the resolution characteristic. It's almost like a hashlink, except it makes no claim about location (or any other properties), only about existence. What is known about malware (metadata) could vary in thousands of different ways, and be stored in databases all over the world, and nobody intends to be an authoritative source for any of it. They just want to agree that they're talking about the same thing. Full stop. And since the mechanism for generating the identifier derives it from the subject, there can never be any controller making decisions, by definition. Uncontrollable things exist, and we identify them. Since there is no resolution, there is no controller. That's what the controller controls--the DID doc/resolution, n'est-ce pas?

Now, you could say, "No. We must have a DID doc. 99.999% of DIDs are worthless without it. For the weird corner case you're bringing up, if it really exists at all, just keep the convention and live with the weirdness." That would mean that we analyze the first person to create the guaranteed-empty DID doc for malware X as its "controller" for the purposes of the DID ecosystem. But two researchers could discover it independently, with no way of proving who was first. So we have a theoretical controller role that is unavoidably ambiguous, just because we want to keep the concept of controller. If we instead say, "Yep, there's cases where something exists but its metadata is not controlled, and DIDs can point to them. In such cases, it becomes impossible to create a DID doc (because if you do that, by definition you're exercising control), but it's still a sort of DID because it's a decentralized identifier" then we get to broaden the conceptual tent of DIDs a bit.

ewelton commented 4 years ago

It is not clear to me yet how this interacts with methods - perhaps some methods are capable of representing passive, persistent objects, and other methods are not - because the owner of a set of keys may always be able to update "the DID document".

On the other hand - they can't update the did itself once it is minted - and that is what matters. So the controller (of the registration) might only be able to update information about the subject's registration record.

Another thing that I worry about is that a did:<method>:<data> model - if the <data> is related to the genesis key pair, then that method might not be capable of representing the "virus hash" you described above. The virus hash could only be represented as an assertion (or a VC)

In other words, for methods where the <data> part of the DID is derived from the genesis key pair, then the document belongs to the discoverer and the ability to represent an arbitrary thing, in a self-certifying manner, is simply not possible in that method. Those methods are constrained to represent "loci of control", which is an undeniably critical group - and the one which has come to dominate our thinking and discussion of DIDs.

This is especially apparent in the context of "verification methods" (i.e. #190 ) when combined with the new abstract-data-model/registry approach. The ADM/Registry model forces a "union of all possible realities" model and results in very complicated modeling. For example, we will need to put this sort of information in the registry:

capabilityDelegation field, if present, means <x>, but it MUST not be present when the method supports non-key hash registration in the data component of the DID and the data component of the DID is not directly related to the control mechanics of the underlying method. If, the method does not support arbitrary hash registration, then the capabilityDelegation field MAY be used, subject to the definition above.
assertionMethod field, if present, means <x>, but it MUST not be present, when the method supports non-key hash registration in the data component of the DID and the data component of the DID is not directly related to the control mechanics of the underlying method. If, the method does not support arbitrary hash registration, then the assertionMethod field MAY be used, subject to the definition above.

or the ADM/Registry needs structure like

for methods which allow arbitrary hash registration and for those DIDs which do not correspond to genesis keypairs (e.g. method1, method2, etc.), then
- capabilityDelegation field MUST NOT be used
- assertionMethod field MUST NOT be used
for methods which allow only DIDs derived from genesis key pairs
- capabilityDelegation field is OPTIONAL and means <x>
- assertionMethod field is OPTIONAL and means <x>

use of an @context field simplifies this substantially by allowing a DID-document to declare the semantic which applies - as in "This subject represents a locus of control" or "This subject was discovered and represents an external entity" - and, of course, depending upon the method - it may or may not be possible to render the DID document formally immutable - in which case, an actor capable of updating the DID document could "morph" the "sort of thing" the DID represents.

What this suggests to me - in answer to @dhh1128 's question

Would we be comfortable saying that DIDs can be used to identify such things, too? And if yes (which I hope is an easy answer), are we willing to not describe such a scenario as "the controller creates the DID" but rather "the DID identifies something inherently uncontrollable, so it never has a controller, even during creation; rather, it has a discoverer" (or something to that effect)?

Is that it is far from clear whether or not DIDs are suitable as generic identifiers for self-certified content. Perhaps DIDs are always and only statements by actors, about people, organizations, and things - which means

update/retract (#213)
disallow https://github.com/w3c/did-core/issues/199#issuecomment-589401616
allow https://github.com/w3c/did-core/issues/199#issuecomment-588686738

Alternatively we could move the semantics partially into methods. Perhaps we could have did-core define a set of "classes" of DIDs - each with it's own ADM/Registry/@context and let methods subscribe to them somehow - perhaps with an @class attribute which names the appropriate did-core semantic model. If we did that we could possibly

keep (#213)
allow https://github.com/w3c/did-core/issues/199#issuecomment-589401616
allow https://github.com/w3c/did-core/issues/199#issuecomment-588686738

The most radical suggestion would be to step out of the battle altogether, and give DID-documents a sort of "sovereignty" and let them announce what they are and how to process them using some sort of attribute that identified and advertised feature and property sets. The proposed attribute would let the creator of the DID-document assert things like

this DID represents a locus of control
this DID is an actor/agent's description of a context
this DID represents a discovered digital artifact as not controlled

and on and on - at the discretion of the environment and suitable to the needs of adopters.

We could even say "if a DID-document says nothing, then it is assumed to follow the rules in did-core" and provide a fallback Abstract Data Model that clearly defines what it ought to be.

ewelton commented 4 years ago

@jandrieu re: 1b - i believe 1b is specific to the controller attribute in the DID-Doc, not the qualitative ability to control the DID-doc, simply the explicit representation of it in the DID-doc.

if there is always a controller, then the hypothesis that starts this is not possible. DIDs can not represent immutable content, they can only represent loci of control - and as such they can not really refer to things - they can only refer to the controllers name for things.

In other words - "The Moon" can not be the subject of a DID I create, "What Eric Thinks of as The Moon" is a proper scope, but "The Moon" is not.

dhh1128 commented 4 years ago

Quoting Joe from issue #122 :

DID Controller is a functional definition. Any entity that can actually control the DID Document is a controller.

So if my theoretical DID method exists, in which it's impossible to place any metadata in the DID Document, or to create one at all, that would imply that there cannot be a controller, because nobody can perform the function that satisfies the definition.

This begs the question, of course, is the DID method that I posited allowed to exist? I can think of many use cases for it. It's highly decentralized (would score great on many rubrics), but by its lack of resolution support, it is definitely an odd duck.

jandrieu commented 4 years ago

What you are talking about is not a DID. It's just an identifier.

Obviously, there is still a discussion going on about what constituted meta-data. And, to my mind, I want ALL meta-data out of the DID Document. What needs to be in the DID Document is the cryptographic material for secure interaction (everything else is meta). In some cases, that material can be deterministically derived from the DID itself, like with did:key, in which case resolving the DID is how you transform the raw DID into the DID Document.

I think a big part of what's happening right now is people wanting to do EVERYTHING with DIDs, and I agree DIDs can refer to ANY subject. But that doesn't mean they are the right tool for every single identifier use case nor is it appropriate to pollute the core spec to support convenience features. They can be addressed in DID-AWESOME instead of DID-CORE.

If your identifier is most appropriately generated by hashing the object, GREAT. Just use that as an identifier. No DID required.

The fundamentally topological shift in DIDs over other forms of identifiers, including cryptographically verifiable ones like public keys, is the level of indirection between the DID and the cryptographic material, allowing for appropriate maintenance like rotation without invalidating the DID and auditing of transitions in material over the lifetime of the DID. Without that level of indirection, which is the fundamental link between DIDs and DID Documents, then you don't have DIDs, you just have an identifier.

jandrieu commented 4 years ago

@ewelton wrote

In other words - "The Moon" can not be the subject of a DID I create, "What Eric Thinks of as The Moon" is a proper scope, but "The Moon" is not.

That's all it ever could be.

The singular notion of "The Moon" doesn't exist. That is just what English speaking people, aka Eric, sometimes use to refer to the Earth's natural satellite. Other people use other terms.

This is the fundamental shift that VCs gaurantee. All you can ever say are statements that "some issuer asserts some 'fact'", which is exactly the structure above. This is epistemologically rigorous. Imagining that "The Moon" is, in absolute knowable truth, the subject of a given DID is not. In order for such a statement to exist, we would first have to rigorously understand what "The Moon" really means to you. Then what it really means to me. Then we might be able to convince ourselves that we are talking about the same thing.

It's the same with DIDs. The only way to know if the subject is what you think it is (unless you are the controller) is to gather enough assertions about that DID to convince you of what the Subject is. And EVEN then, all you have done is convince yourself.

Reality is fundamentally unknowable. All we can do is invest resources convincing ourselves of enough shared agreement to interact reasonably.

So, this isn't about a search for Truth with a capital "T". That's a fools errand. Rather, DIDs are a rigorous mechanism to establish cryptographically secured interactions with an arbitrary Subject. Figuring out what that Subject is or is not happens at another layer, including the mechanisms that embody what it means to "interact" with the Subject.

ewelton commented 4 years ago

@jandrieu I believe there is more to it than 'just an identifier' - it is more than a UUID, because it is linked to the thing itself. It is suitable only to 'hashable' objects, and not physical objects. You can't hash a tree, you can't hash the moon - and, you can argue that you can not refer directly to "the moon" - there is a huge tradition in philosophical semantics about exactly this - and DIDs, in a sense are taking a deep philosophical stance.

So far - what seems like it works is this: 1 - DIDs can not be used to identify digital content in a shared namespace 2 - DIDs can not refer to things 3 - DIDs can be a specific actor/agent's name for a thing

In a sense, it does not matter where this falls - just as long as it falls somewhere and leads to clear and precise (and simple) language. So "the subject is the king of england" for example, would not be quite optimal "actor-x's name for the king of england is did:123" would be the right way to say it.

jandrieu commented 4 years ago

So "the subject is the king of england" for example, would not be quite optimal "actor-x's name for the king of england is did:123" would be the right way to say it.

Yes. That's what DIDs always say. But since we ALSO don't know who the Controller is, the statement "Controller's name for a thing is XYZ" is rigorously restatable as "A thing is the subject of DID XYZ"

The assertion that DID XYZ refers to the King of England goes in a VC if you want it to be rigorous, in which case you get the lovely construct that "Issuer ABC says DID XYZ is the King of England".

ewelton commented 4 years ago

@jandrieu Nah, I don't quite agree with that. I would agree that saying "A thing is the subject of DID XYZ", while technically rigorous, leads to exactly the sort of miscommunication the community has been having.

I'm not sure I follow the VC comment. Who is ABC and how is ABC related to the construct?

What I'm trying to get to is making it clear, in everyday language, so that it is always apparent that A thing might have dozens of DIDs, because DIDs are "scoped" by controllers - and DIDs can not always serve as points of coordination in a discussion.

What we want for the VC case, and what is being discussed here, is that - given the limitations of DIDs and the incorrect statements about their scope for the last few years - is a new form of identifier that can be shared by communities, and around which we can clearly say "The controller of DID XYZ says DID XYZ refers to N" and "The controller of DID ABC says DID ABC refers to N" and then let DID ABC and DID XYZ rest happy that they are talking about the same N, so that they can have fruitful discussions about attributes of N, such as "cn=King of England" vs. "cn=King of Great Britain"

dhh1128 commented 4 years ago

Okay. I think Joe's given a succinct articulation of a position on the proper scope of DIDs. Thank you, Joe. I love the crispness.

I would like to ask for two things to resolve this issue:

A survey of the group to see if they agree with Joe's rule of thumb.
Assuming yes, a new PR against the DID spec that summarizes the thinking, so future readers of the spec don't wonder whether DIDs apply to their use case. (I am happy to volunteer to raise such a PR, contingent on #1 and on the rest of my comment. Or someone else can.)

Before we poll the group, however, I would like to offer an alternative formulation to Joe's. I don't know if I can be as crisp as he was, but I'm going to try. Going into this, let me acknowledge that the following is heresy, according to the spec; I'm only articulating it because I wonder if we're missing an opportunity here, if we could let go of tightly held notions a little. Here's the alternative worldview:

Lots of identifier schemes already exist. They have various properties. DIDs are unique in that they accomplish ALL of the following goals simultaneously:

Decentralize: allow identifiers to be created by anyone, without permission or coordination.
Eliminate ambiguity: make the referent of the identifier completely uncontroversial.
Provide extensibility: define a methodology whereby new subcategories can be defined without a central authority, yet guarantee that common processing remains viable.

UUIDs accomplish goal 1, but not goal 2 or 3. A given UUID can mean anything, to anybody. Fred can create it, and Jill can repurpose it. They can argue about who's right, or whether they're both right. There is no strong binding to anything in particular. Most decentralized identifiers (e.g., the names of newly born children) are similar.

IP addresses accomplish goal 2, and sometimes goal 3, but not goal 1. Most centralized systems (twitter handles, phone numbers, domain names) are similar.

DIDs accomplish goal 1 in lots of clever ways that I won't go into here.

DIDs accomplish goal 2 in one of these ways:

a. They use cryptography to bind the identifier to a controller. The controller then defines what the identifier refers to. This was the original use case for DIDs, and the one we've thought about the most.

b. They define some other intrinsic property that is objectively observable, that derives the value of the identifier, such that it is impossible for the binding to be ambiguous. A DID that identifies each element in the periodic table by its atomic number would eliminate ambiguity without having cryptographic control, while still remaining decentralized, and while still being enough of a DID to be processed by DID handlers.

Notice that in this formulation, cryptographic control is a means to an end (eliminating ambiguity), not an end in and of itself. Notice also that cryptographic control is just a special case of the other approach (objectively observable property that makes the binding unambiguous). I think that's the crux of the difference between this worldview and the other one.

DIDs accomplish goal 3 through the use of the DID method extension mechanism.

Now that I've articulated an alternate worldview, here's the argument I'd offer in its favor: Although the world needs control-based binding for DIDs in the worst way, it also needs the other kind of binding (which I might call inherent binding). Both bindings are worthy of the moniker "decentralized identifier." UUIDs are not a good alternative because they lack the solution for ambiguity. URLs are not a good alternative because they lack decentralization of domain names. If we force the conception of DIDs to be narrow, we're setting ourselves up for a situation where another type of decentralized identifier comes along that has just as much claim to the word "decentralized", but that thinks about control differently. Result = muddiness and doubt about adoption. If we bring this ugly stepchild into the DID tent and let it take a bath, I suspect it will turn out to be cute and a good family member, in time. I don't think it would take much more than 2 or 3 paragraphs to talk about "uncontrolled decentralized identifiers" in the spec; they're way simpler than the controlled variant.

dhh1128 commented 4 years ago

Tagging a few people who may have opinions about this interesting conversation: @peacekeeper @dlongley @msporny @burnburn @brentzundel @talltree . Please bring in others as appropriate.

ewelton commented 4 years ago

@jandrieu I think i passed over https://github.com/w3c/did-core/issues/233#issuecomment-600843933 while I was writing my response. and to @dhh1128 's

Now that I've articulated an alternate worldview, here's the argument I'd offer in its favor: Although the world needs control-based binding for DIDs in the worst way, it also needs the other kind of binding. UUIDs are not a good alternative because they lack the solution for ambiguity. URLs are not a good alternative because they lack decentralization of domain names. If we force the conception of DIDs to be narrow, we're setting ourselves up for a situation where another type of decentralized identifier comes along that has just as much claim to the word "decentralized", but that thinks about control differently. Result = muddiness and doubt about adoption. If we bring this ugly stepchild into the DID tent and let it take a bath, I suspect it will turn out to be cute and a good family member, in time. I don't think it would take much more than 2 or 3 paragraphs to talk about "uncontrolled decentralized identifiers" in the spec; they're way simpler than the controlled variant.

I think this is exactly right it was what I was trying to capture with, what we want

is a new form of identifier that can be shared by communities, and around which we can clearly say "The controller of DID XYZ says DID XYZ refers to N" and "The controller of DID ABC says DID ABC refers to N" and then let DID ABC and DID XYZ rest happy that they are talking about the same N, so that they can have fruitful discussions about attributes of N, such as "cn=King of England" vs. "cn=King of Great Britain"

in other words - there is a missing piece to the puzzle. DID's are not necessarily up to the task, unless there is some tweaking to the spec - some core, fundamental tweaking and clarity.

So far, attempts to discuss the missing puzzle piece get blocked by discussion of controllers, subjects, and very obtuse technical issues. Those discussions have cut off the forest and the larger view has been lost. I like the idea of "bringing it into the DID tent, and giving it a bath"

jandrieu commented 4 years ago

DIDs don't solve #2

In fact, I don't think #2 is possible in any construction. We can only clarify the DID and when we refer to the DID we can use an unambiguous string of characters.

However, any statements can get attached to that identifier, by any author, and there is no way to know--at the DID level--which statement is "correct". Even if one of the statements is signed by the Controller, you can't be certain that it is "correct". Heck, you can't even prove the controller is the Subject.

What you are bumping up against is essentially Goedel's incompleteness theorem. You can't disambiguate everything. There will always be statements that cannot be proven, no matter how convoluted our schemes may be.

All we can do is anchor assertions by specific issuers to understand (and document) what they are willing to assert about a Subject, as identified by a DID. Statements about the same DID can be taken to be intended as statements about the same Subject, but even then the statements themselves may be wrong.

Content-based hashes of arbitrary content are NOT DIDs because they cannot be resolved directly to some form of cryptographic material. You could, of course, create an IPFS DID Document and have a DID method that uses its content-based address, but that hash is of the DID Document, not of the resource.

IMO, if we are going to get closure on this spec, we need to stop trying to add everything that seems like it might be convenient, and we need to stop trying to construct crazy edge cases--ESPECIALLY if you have no use cases for it (as you put it @dhh).

Maybe others with more experience in standards development can chime in. I know that VCs almost didn't get done because of mid-process shifts to support ZKPs. The consensus was that was a good thing. But it still risked finishing within the required deadline. Kitchen sink engineering a solution that solves everyone's problems is, IMO, an anti-pattern in a standardization process.

We need to be here locking down the simplest feature set for maximum interoperability to do the fundamental thing that DIDs do: enable cryptographically robust management of identifiers without reliance on central registry entities to keep track of who controls what. EVERYTHING else is superfluous and deserves a critical evaluation about whether or not we can remove it and still achieve the fundamental requirement of this work. EVERY add-on is another lengthy drawn out debate, additional implementation complexity, and yet another point of confusion for anyone who wants to adopt the tech. So, let's stop with the add-ons and start focusing on what we can do to minimize the complexity rather than exploring how we can extend DIDs to do extra magic. If DIDs can do that magic, it is perfectly fine to add that at another layer or in the next iteration of the spec.

ewelton commented 4 years ago

@jandrieu would you then be backing this

1 - DIDs can not be used to identify digital content in a shared namespace 2 - but allow https://github.com/w3c/did-core/issues/199#issuecomment-589401616 3 - clarify that DIDs can not refer to things, only a specific actor/agent's name for a thing 4 - summarize the structure as https://github.com/w3c/did-core/issues/199#issuecomment-588686738

does that seem right?

jandrieu commented 4 years ago

Um... no.

DIDs can identify ANYTHING. I've said this before, so I'm surprised you'd suggest I'd back that set of statements.

jandrieu commented 4 years ago

My particular point here is that the are mathematical guarantees we can affirm with DIDs. That's what the cryptography gives us. Anything more than that which we can mathematically guarantee should be achieved at another level.

ewelton commented 4 years ago

@jandrieu ok, so it seems like we're stuck.

It may not be possible to discuss DIDs.

Either a DID subject refers to ANYTHING and NOT a name for a thing scoped by a controller. But I have a feeling that if I say that it represents ANYTHING then you will say that it is scoped by a controller. I am getting dizzy.

If I was trying to describe DIDs to clients and customers (which i have stopped doing by the way) I need to be able to say something - if I say that "the subject is the King of England" to them without clarifying that there is a controller involved, they get the wrong idea. So I try to say "the subject is scoped by the controller" and then you say "no, I am suprised you said that" - I really am totally at a loss.

A DId subject is both scoped by a controller and not scoped by a controller and it is sometimes anything and sometimes restricted. I just don't get which set of constraints are in play - other than jsut not what anyone else is saying.

dhh1128 commented 4 years ago

DIDs don't solve #2...any statements can get attached to that identifier, by any author, and there is no way to know--at the DID level--which statement is "correct"... Even if one of the statements is signed by the Controller, you can't be certain that it is "correct". What you are bumping up against is essentially Goedel's incompleteness theorem. You can't disambiguate everything. There will always be statements that cannot be proven, no matter how convoluted our schemes may be.

Perhaps you read #2 a bit too fast?

I'm not interested in proving the correctness of arbitrary statements about an identifier. I agree that anybody can claim any attributes they want about anything, and that it's not useful/desirable for DIDs to facilitate that. In fact, the example scheme I proposed explicitly precludes the association of any statements with the identifier other than existence/scope of reference (the subject). I'm saying that it's a defining characteristic of DIDs that they prove the correctness of exactly one type of statement, which is an assertion about scope of reference -- and I'm claiming that is a generalization of the variant you like, which is scope as proved by cryptographic evidence. Control is only interesting as a mechanism of achieving the real goal, which is knowing with confidence what you're talking about. Your own verbiage "Even if one of the statements is signed by the Controller" presupposes that it's possible to ascertain truth about this subtopic; signing is just the mechanism for proving that the scope of reference is what the Controller, not some other entity, asserts. I think this is exactly what you meant when you said the DID subject can't be the moon, but can be what the controller thinks of as the moon.

While it is true that eliminating all ambiguity is impossible, and on a philosophical level, we can't even prove that we exist rather than being figments of one another's imaginations, I am very surprised to hear anybody claim that DIDs don't provide practical clarity about what the referent is. Elsewhere you have claimed that the referent is whatever the controller wants it to be. That's an unambiguous binding. Yes, it can change. Yes, the controller can do a lousy or inconsistent job of definition. But the fact remains that whatever scope of reference is embodied in the controller's choices constitute exactly and uncontroversially the referent for a DID at a point in time, if the binding is based on cryptographic control.

Maybe others with more experience in standards development can chime in. I know that VCs almost didn't get done because of mid-process shifts to support ZKPs

I agree that bringing this up and tackling it is a tradeoff. Eric is not alone in believing that if we don't broaden our conception, important use cases are lost. But that could be the right answer, and I would accept it if it's the will of the community (even though I continue to disagree with your other argument). So I, too, am curious to hear how other people would weigh it.

jandrieu commented 4 years ago

@ewelton I don't think we are stuck. We are just dealing with the fundamentals of what is knowable and what is provable. As such, we bump into issues of epistemology and Goedel's incompleteness theorem. There are bounds on what we can know and bounds on what we can prove. Any technology that purports to exceed those bounds should be considered with the same skepticism as claims of a perpetual motion machine.

That said, it is a different issue how we talk to regular folks. In the same way that it is hard to explain why perpetual motion machines will never work, it will be hard to explain the boundaries of what is knowable and provable.

ewelton commented 4 years ago

@jandrieu I understand how you frame it and why you say what you are saying. But there are practical solutions to the problem @dhh1128 raised. More importantly, we just need to pick one and move forward.

What you are saying is true, but I feel you are simply missing the point of what we are saying, and are convinced that this is because we fail to appreciate your point.

The subject of a DID has no semantics - and, importantly, if the hash is cryptographically bound to the genesis key pair, then it CAN NOT serve the role of identifying digital content in a self-certifying manner. Instead, it can only be the "name" of a record that contains the target identifier.

What we are exploring is a way to augment that environment - to make self-certifying content identifiers first-class citizens. This exploration is not about mathematical provability or Cantor's Paradise.

In terms of did methods - we are starting to see 'strange methods' like did:key - which, one might argue, have a different relationship with 'controllerhood' than do blockchain-resident did methods with long running did-documents that can evolve over time and can engage in complex expressions of verification methods and service_endpoints.

The option on the table is to recognize some of those differences - and instead of rage against them, decide if that variation can be co-opted and exploited.

In a sense it does not matter which is chosen - as long as it is chosen soon, and precisely. There is a strong argument for disallowing this sort of "content-hash" immutable element - like did:immutable:<hash> - and there are arguments for it. It is not the case that it is fundamentally impossible due to the Principle of Least Action driving the inherent increase in Entropy we commonly experience as the Arrow of Time - it is a pragmatic decision for the spec.

pknowl commented 4 years ago

An object in a decentralized network needs an identifier. The DID name itself "Decentralized Identifier" suggests that there should be room to include a solution in the DID spec.

pknowl commented 4 years ago

... and, for the record, semantic objects should never be governed in a decentralized network. That is why schema.org, etc. are open-access and free of governance. If semantics are governed they simply won't be adopted.

jandrieu commented 4 years ago

I may be missing the point. I certainly don't understand what @dhh is trying to get at with disambiguating. But I also don't understand your previous comments. We can talk about DIDs and your suggestion that I would support those three items you listed made it seem like you didn't understand my point. If you do, great.

I really don't understand how #2 is accomplished, in any identification architecture.

Eliminate ambiguity: make the referent of the identifier completely uncontroversial.

@dhh later expands that to

I'm saying that it's a defining characteristic of DIDs that they prove the correctness of exactly one type of statement, which is an assertion about scope of reference -- and I'm claiming that is a generalization of the variant you like, which is scope as proved by cryptographic evidence.

I'm still not following. The referent is not scoped by the DID. Rather, a link to a certain set of cryptographic material is provided by a DID Method after resolution.

That's it. What's what DIDs do. Resolve up a DID and you'll get some cryptographic material that can be used to interact securely with "The Subject" whatever/whoever that is. Maybe it is the controller. Maybe it is not. It isn't well scoped at all. It can even change over time. It is completely ambiguous what it refers to.

The only DID that resolves ambiguity is this hypothetical did:immutable. Which doesn't seem like a DID at all to me. So, yes, you can change the definition of DIDs to add something like did:immutable. But you can't say DIDs have a primary function of removing ambiguity--and then use that to justify an argument FOR did:immutable--because no other DIDs do that.

Don't get me wrong: immutable ids are cool. iid:[hashtype][hash] seems like a reasonable thing to standardize. github.com/w3c-ccg/multihash seems like it's half-way there.

I just don't think that's a DID in any sense that this community has been working on.

Maybe I am missing something. In any case, I'm definitely not following the logic on how did:immutable and its kin is anything like other DIDs.

Also... I'm not raging. I'm just disagreeing. DIDs are a thing. They aren't everything. They don't solve all the identifier problems. They are not the right identifier for every kind of thing that might need an identifier. They are a particular type of identifier that might be useful for certain things. Their key distinction is the ability to find the current authoritative cryptographic material for interacting with the Subject of the DID.

Before DIDs, there was not a particularly good way to find such material, not in any definitive way, without reliance on a third party. PGP's web of trust was the best prior art in this area. DIDs are a huge advancement in the usability of cryptography for a large number of use cases. It would be great if we could just focus on getting this fundamental innovation in the books, so we can turn our attention to building the amazing services on top of DIDs that so many of us are excited about.

pknowl commented 4 years ago

The name DID should really be DEI (Decentralized Entity Identifier). DID suggests that you can identify anything in a decentralized network. If an object identifier cannot be accommodated, the name DID is misleading which is a shame. We would also have to build out an entirely new standard for a DOI (Decentralized Object Identifier) which of course can be done.

In an ideal world of DIDs for everything in a decentralized network, you would have did:e:<hash> for Entity and did:o:<hash> for Object.

Which way are we going to go?

dhh1128 commented 4 years ago

@jandrieu : I think we are talking past each other because we are talking about different manifestations of ambiguity -- and it might be because of my own clumsy language. If so, I apologize. Let me try again. And let me step away from DIDs for a minute; maybe a different context will help.

Suppose, one day, that Alice invents a brand new word: "habapookajar." She's at a party, and she applies it as an adjective to a person wearing expensive Italian clothes. Those who overhear her are pretty sure it means something sort of like "sophisticated" -- but they're not quite sure. Her meaning is ambiguous. Even if they ask Alice what she means, there's no guarantee she'll tell them the truth, or be able to give them a definition that perfectly embodies her intentions.

This is ambiguity, and I believe we're in alignment in suggesting that it's fundamentally unresolvable. Let's call that "type 1" ambiguity for a moment.

But at least we know who's the definitive authority on the meaning: Alice. Whatever she says it means, we have to accept. There's no ambiguity about that, right?

Or is there? Suppose there's another party a week later, and Bob is overheard using this word. Someone asks him if he got it from Alice, and he says "No, I invented it. Who's Alice?"

Although all ambiguity has things in common, this new ambiguity feels like it's worth putting into a second bucket. Let's call it "type 2" ambiguity. This is not ambiguity about what the word means; it's ambiguity about how to approach learning the word's meaning; we don't even know where to start.

No identity systems can resolve type 1 ambiguity.

A centralized system resolves type 2 ambiguity because the system is the acknowledged authority on the question of what the identifier refers to. That doesn't make the identifier's meaning perfectly clear (nothing can) -- but it removes any ambiguity about how to learn more. But type 2 ambiguity has always been a big problem in decentralized systems, because there is no such authority.

Part of the genius of DIDs is that they solve this problem. That's a hugely valuable innovation. We've explained that innovation in terms of cryptographic control, and if we choose to, we can continue to explain it that way. We can say that the problem is proving control, and the solution is cryptography.

But what I'm suggesting is that we can define the problem in a slightly more general way, and that this might have nice consequences. It would be a tradeoff, as you say.

Old problem statement: How do I prove control of the identifier? Old answer: With cryptography.

New problem statement: How do I eliminate type 2 ambiguity? New answer: So far we've imagined two ways. One is to prove control with cryptography. Another way is to derive an identifier from objectively observable properties that remove all ambiguity. Maybe we'll realize there are other ways, too.

I admit that this new formulation is a departure from the official party line. The arguments in favor of it that I'd offer are:

It explains cryptographic control's desirability from first principles, not as an end unto itself. That feels deeply true/correct to me.
It is open-ended, but not infinitely broad. It claims for DIDs all conceptual identifier territory that intends to be decentralized but not type 2 ambiguous. UUIDs and numerous naming schemes fall outside the scope for clear reasons, but other mechanisms could be discovered that have the defining properties. Maybe we'll learn something. It would be nice not to have to start a new standard when the one we've already built anticipates such possibilities.
It allows me to leverage the hard work that's been done on DIDs to solve a whole new set of problems that are currently ruled out by the insistence that DIDs must be based on control of the identifier. This set of problems has been simmering in the background of DIDs for several years now, with people never quite able to explain why they felt misaligned. It's now late in the process, but I finally feel some clarity about why the disconnect and what a solution might be. The identifier variant that derives from objective properties is a type of identifier that's "discovered" rather than "created", and I intuit (but cannot prove) that we may come to love that type of identifier and want it under the DID umbrella.

I don't think these three arguments are a slam dunk argument in favor of what I'm proposing. But I'm hoping that at least my worldview and my comments about ambiguity make better sense?

pknowl commented 4 years ago

Spelt out, the two options are doi:<hash> or did:o:<hash> for an object identifier. That should probably be put to a community vote.

talltree commented 4 years ago

Wow, what a thread. That we can be having this deep a conversation about identifiers so fast that if I take my eye off the list for 2 days I miss the whole thing...amazing.

I just now have taken the time to read the whole thread, from top to bottom. Here's my thinking. As someone who has worked on DIDs from the very first version of this spec four years ago, let me put it this way:

If this had come up in the first three years of the spec development, I would have agreed with Joe, i.e., DIDs are about cryptographic verification of identifiers—and thus they always have DID docs—end of story, go away.
About a year ago, or whenever did:key: came along, it was a shock to the system. A DID that did not have a DID document, but generated a DID document. Whoa! That was a head-banger. At first I said, "Hell no". But then I listened to folks and thought about the use case...and finally "widened my thinking" about what a "decentralized identifier" should be because clearly did:key: was valuable.
Ditto all of the above for did:web:. I still personally find it distasteful, but I can see the use cases, and even in that method, due to the presence of a DID document, there are still ways of using the cryptographically-verifiability to work around the fact the method has a highly centralized component.

So that brings us to did:r:. When I first heard about the concept, I said, "Oh, it's just a content hash expressed as a DID. There's no DID document. There cannot be a DID document. So clearly that's not a DID." But then, like all three examples above, I stopped and listened to the use cases and started thinking about it. And I found myself agreeing with @dhh1128 that the overall concept is simply a different application of cryptographically-verifiable identification.

In other words, if you look at a did:r:, it is indeed cryptographically verifiable, but not through its cryptographic association with a public key, but through its cryptographic association with the DID subject.

That is still a cryptographically-verifiable identifier. And it's still decentralized.

And it's valuable because I have spent time with the proposers of this new method and they have a MOUNTAIN of use cases for it. Entire industries might end out being built around this particular "branch" of DIDs.

And so I've "widened my aperture" once more and now agree that including content-based verification methods as valid types of DID methods makes sense. Even if they explicitly do not involve any DID document.

So I urge not just @jandrieu but all members of the WG to take a close read of this thread and see if you agree. And if it would help, perhaps the proponents might host a special call or a webinar to explain their use cases in more depth as that would probably help too.

mitfik commented 4 years ago

@jandrieu

In some cases, that material can be deterministically derived from the DID itself, like with did:key, in which case resolving the DID is how you transform the raw DID into the DID Document.

Following that statement I would say that did:r:<multihash> is same thing, I can generate DID Document out of it following defined rules which can give me the same sort of DID Document (if I am not mistaken valid did doc can include only id). But in many cases I just don't need DID Document which does not make the DID useless.

Many people think that DID points to DID document from where you can learn more about what you can do next. But as we know that is not true. DID points to the key and each method defines how to "construct" DID Document out of given DID. E.g Some ask you to look on the ledger to get the document, which is not different from having it in some sort of immutable DB. Others like did:key allows you to derive the DID Document out of the method-specific-id. If this is commonly accepted pattern did:r would follow same rules. Just in many cases in DID Document there would not be much meta data, in many cases it would be just empty. Or I could use

Current spec defines DID as:

Decentralized identifiers (DIDs) are a new type of identifier to provide verifiable, decentralized digital identity. These new identifiers are designed to enable the controller of a DID to prove control over it and to be implemented independently of any centralized registry, identity provider, or certificate authority.

This what I would like to see is something like this:

Decentralized identifiers (DIDs) are a new type of identifier to provide verifiable, decentralized digital identity. These new identifiers are designed to enable cryptographically-verifiable identification and to be implemented independently of any centralized registry, identity provider, or certificate authority.

Why? The reason is simple so called DRI and DID have a lot of in common on high level they are the same thing. Getting DID spec generic enough is beneficial for everyone. This is what makes standards really powerful that they can unify parts without limiting the use of it to specific use case. In the current DID spec the specific use case is to use DID only for the purpose of controlling specific DID Document. And not even that as we already saw the movement that the "controlling" part is optional so you can have DID Document which cannot be altered after creating it. Seems that we are just on small step away from DRI in that situation.

@talltree gave a very good example how the story changed over time while thinking about what DID should/could be. In my opinion This shows how DID slowly getting mature and people realizing that the problem which they started with in first place can be apply to broader space.

As @dhh1128 already mentioned the reason why we even having that discussion is that DID seems to overcomes any other existing standards due to it's specific properties:

Decentralize
Eliminate ambiguity
Provide extensibility

And the common denominator for all above is cryptographically-verifiable identification as @talltree pointed out. For me personally this gives very solid standard which if it would be adopted would give community a lot of benefits as your wallet/agent/system/website/you name it, implements one standard for cryptographically verifiable identifier and you can support people identity, things identity, content identity. Without saying how this identity is defined.

There was already a statement that DID allow to identify anything, yes but through DID Document which in many cases is just unnecessary step e.g. for identification of the content. Which could happen that reveals to much.

peacekeeper commented 4 years ago

I also just read the thread, and even though I find the idea intriguing to broaden the concept of DIDs to all "cryptographically-verifiable identification", without DID documents and without DID controllers, I still have a preference for @jandrieu's perspective, and for sticking with the current "party line".

Unfortunately, we have monopolized the term "Decentralized Identifier (DID)", when in fact there are other "decentralized identifiers" out there. But still I believe we should stick to the mental model that all DIDs are "created" and "controlled".

(Side note: in very practical terms, deviating from this may mean deviating from the DID WG charter - see the first few bullet points).

I would argue that there are existing and better "decentralized identifiers" than DIDs, which can already be used for identifying malware or elements in the periodic table. Those identifiers are URNs.

Magnet-Links have used urn:sha1:xxxx for a long time. You could create a more modern version urn:multihash:xxx, and I think you got what this thread is all about? "Cryptographically-verifiable identifiers" that are "discovered" rather than "controlled"?
You can always create new URN namespaces for other things like elements in the periodic table, e.g. urn:atom:xenon or urn:atom:54. Sorry, "URN namespace" doesn't sound as cool as "DID method" :(

pknowl commented 4 years ago

My concern is that if we don't introduce did:o:, we may alienate certain industry sectors which would damage the global appeal of DIDs. For example, the biggest Pharma DLT consortium project that we are in talks with at the moment are discussing using a platform architecture that doesn't use DIDs at all. If we had did:o: in the spec, they would definitely adopt DIDs and, with it, we could introduce the whole DID/VC flow into their architecture. Without that DID method, they have absolutely no need to delve into the world of DIDs, which compromises our overall desire to build a truly interoperable decentralized data economy. No pressure, huh @peacekeeper

peacekeeper commented 4 years ago

@pknowl I understand that we may want to broaden the scope of DIDs as much as possible for marketing reasons, and I'm not strongly opposed to doing that. But from a purely technical perspective, why couldn't that consortium participate in a "truly interoperable decentralized data economy" with URNs, if they want identifiers that don't need controllers or DID documents?

pknowl commented 4 years ago

@peacekeeper Should URNs be used as a universal standard for object identifiers? Surely, a DID should be able to cover all 3 valid identifier states.

In terms of key interdependency, we're already taking a hash of content in the "dependent" state. It sounds weird to disregard content when no entity identifier is referenced. It almost feels like we're turning our close cousins away for Thanksgiving dinner!

talltree commented 4 years ago

@pknowl — can you unpack your chart a little bit more for us? I.e., can you explain more fully:

What you mean by "key dependent" and "key interdependent"?
What is the difference between "trusted" and "immutable"?
What is the difference between an "entity identifier" and an "object identifier"?

I suspect this is all very clear in your head, but the chart is so terse that I suspect others will have a hard time grokking it without those explanations.

pknowl commented 4 years ago

@talltree - No problem at all.

1.) "Key dependent" means that an identifier is governed by an entity and therefore a signing key is required. "Key interdependent" means that an object identifier can either be governed by an entity (whereby a signing key is required) or not governed (no keys required).

2.) "Trusted" means that an identifier is governed by an entity and therefore a signing key is required to establish trust. "Immutable" means that an identifier contains a hash of the content of an object which cannot be changed. If an object identifier is governed, the controller of the signing key has control over the content contained within the associated DID-document and, as such, it can no longer be deemed immutable.

3.) An "entity identifier" is an identifier that is governed by an entity who controls the signing key. An "object identifier" is an identifier that contains a hash of the content of an object.

pknowl commented 4 years ago

@talltree Perhaps this network model will help visually. network.pdf

You'll also notice that I've changed any reference of DRI to DOI and any reference of did:r: to did:o: in my previous entries. My apologies for that. I only cracked the code after already having joined the thread. Anyway, all corrections made. The kernels are now solid.

talltree commented 4 years ago

@pknowl Thank you very much—those terms are exactly the key I needed to unpack your table.

So one way to sum up what you are proposing is to unify the world of controlled DIDs with a new world of uncontrolled DIDs by bringing into the DID world the concept of a multihash identifier.

Now, I have another ask of you (if you are willing). One reason that I suspect for @peacekeeper 's hesitation about bringing the world of "uncontrolled DIDs" into scope is the question of resolution. Given that there is no DID document, have you thought through what a DID resolver would or should return when given a did:o: to resolve?

Could it, for example, return possible network locations of an instance of the identified object? Or any other useful information about the object?

pknowl commented 4 years ago

@talltree There are mindset cabinets that will be unlocked here so I'll try to approach this topic from a place of practicality.

By the very nature of "key interdependency", in the case of an object identifier, some of the did:o: space is already being encroached upon. That is the first issue.

The second and much broader issue is the resolution process as a whole. The method name currently supports 52 different method types. That number will grow exponentially when the world of DIDs hits the masses. This issue can be resolved by moving any location information away from the method space. If that approach were adopted and did:e: and did:o: became the two permanent method names, entity identifiers and object identifiers could be treated autonomously which would resolve any fragmentation issues. (... not to mention enable an interplanetary solution as Elon Musk endeavours to colonise Mars!)

@mitfik and/or @ewelton can better explain what the resolution process might look like but I thought that I should bring up the elephant in the room as "if not now, when?"

ewelton commented 4 years ago

From my perspective DIDs were at a crossroads around Sept/Oct 2019. At that time we had an opportunity to view DIDs as a sort of universal namespace with certain properties linking the identifier to an underlying asset using clever cryptography. This is what made them different from identifiers like UUIDs.

There were two other properties that were attractive:

open resolution w/o a central authority - so DIDs could span all sorts of domains and contexts, and the only commitment made when "using a did" was the guarantee that you could retrieve some sort of information about the DID - e.g. resolve(did)
open semantics which allowed DIDs to fill a huge number of roles, one of which included the pool of verification methods and the service endpoints that define DIDs today

I believe this conversation fits that time in the history of DIDs better than the DIDs of today - which are much more focused. In recognition of that focus, I actually favor @peacekeeper and @jandrieu 's sensibilities around focusing "what a DID is" - but I still think it is worthwhile to present the motivation behind the did:e and did:o concept.

The DID method space still does contain a number of "odd ducks" - like peer DIDs (which I think are critical, but which are definitely a different breed), and did:key or did:web, which strain the idea of "what a DID is" in a different direction. So even if the ideas are late to the party, nothing is cast in stone and it can't hurt to present the ideas.

The core idea is that there are three categories of things in the digital universe:

immutable digital assets, which can be hashed - these are "found objects" - they can be tracked, but not controlled because there is nothing to control as they can not change.
active mutable digital assets representing the extension of an "actor" into digital space - this is what happens when a person or something that is "capable of agency" registers a public key somewhere, sequesters the private key, and uses that fact to bootstrap a digital presence based on that key pair
passive mutable digital assets which are the digital projection (or are inherently digital) of the abstractions and physical objects that we use in our daily lives - a parcel tracking engine, or a flight reservation record, or a collaborative document, and so on. These, by the nature of being passive, require (2) to act on their behalf - yet they also serve as points of coordination and correlation, which is essential to the role they play.

The distinction between 2 & 3 is captured well by the conceptual work around DIDs, with all sorts of wonderful nuances about key rotation and control and verification methods and the like being worked out. This model is still not well understood outside of the DID community, but I think that the DID community has the kinks pretty well ironed out.

DIDs bring a lot of other baggage to the table and is not clear whether that baggage is worth the trouble, since DIDs are not required for credential & trust technologies. You do not need DIDs for credentials or capabilities, and ubiquitous, low-effort, low-barrier to entry, existing network technology can achieve the same level of cryptographic integrity. So what is the attraction of DIDs?

For a while the attraction tto me was that it could create a unified namespace for the above three classes of "network things", with a "common API for resolution" at least as perceived by an application programmer. The programmer would just install the right package and call resolve(did) - and this would bypass any deep or intrinsic reliance on DNS. We finally had a simple, level playing field for asking a bootstrapping question about resources on "the network" - which included centralized, classic, decentralized, and P2P spaces.

DIDs had the possibility to be a generic identifier that could point to anything, support a simple API for resolution (not that the implementation of that API is simple), and open the door for the global community to expand both resolution and semantic and to develop tools, facilities, and services around a "new kind of content and integrity focused address"

The current DID spec is one species, one subset of that broader vision - and I think the steps taken in the process of moving to the working group effectively crystallized DIDs as that subset. This is not any kind of indictment or complaint - it is just that the broader community is catching up to what is going on with the DID authority and so some of these broader concepts keep coming to the party.

For a while we had a shot at an internet landscape where "the essential question" asked by applications was resolve(did) - perhaps rivaling fetch(url) or lookup(host) - and this would have been a glorious development. As it stands today, DIDs are suitable for a more specific range of use cases. They will operate in parallel with non-DID components of the identity, credential, and trust technology landscape.

The issue of resolution is critical - and the fact that it is not a centerpiece of the charter, is, I think, a mistake. There are tremendous challenges towards implementing resolution for the current, restricted DID model - and those challenges are driven by the large number of methods. The large number of methods fosters "wallet siloing", where application builders choose just a subset of methods and decide if they also support other URIs - or perhaps the culture will become dominated by remote universal resolvers, which, in the years to come, might be deployed as thickly as Akamai and Cloudfront cache servers.

It is too early to tell - and I am confident the community can solve the problems. The question that plagues me is not "can we solve it" but "what is the cost/benefit ratio" - and this rests on the scope of DIDs. For a broad enough scope, a high price is attractive - but when the scope is limited, the decision is less clear.

In many ways, the focusing of DIDs is a benefit to everyone - it will help ensure the spec gets out, is cleanly done, and is well managed and under the central authorities of the W3C, W3ID, and other registry, context, and guidance providers. It will ensure that people vested in global blockchains will be well supported. It will be the anchor in the sea of chaos surrounding DIDs.

The focusing of DIDs also opens the door to a new frontier seeking identifiers that can span DIDs + centralized resources + holographic resources + immutable resources (on various storage media, like IPFS, blockchains, etc.) - and that broader space of identifier technology will draw heavily from much of the excellent work that has been done on DIDs.

It is a win-win situation, no matter how this issue plays out - the only way to lose is to keep the issue open for too long, and the essential remaining question, for me at least, is resolution.

peacekeeper commented 4 years ago

@pknowl

You'll also notice that I've changed any reference of DRI to DOI

I think you may have to change it again :) https://www.doi.org/ = "Digital Object Identifier"

pknowl commented 4 years ago

@peacekeeper More reason to pull did:o: into the DID space!

kdenhartog commented 4 years ago

Just wanted to hop in here and offer a point of view. While I believe that these CIDs are useful, I find that it's possible to represent the same data while keeping the mental model of a controller existing and abandoning the did document.

For example, let's say I had did:immutable:b85dca566725ca2d1baee467a13561af1346953a7bf281b1e259b172f5c740ab

(that's the sha256 hash of "I know this content")

and it's published to a registry with the following did document:

{
 "@id": "did:immutable:b85dca566725ca2d1baee467a13561af1346953a7bf281b1e259b172f5c740ab"
}

then what I'm asserting as the publisher (and thereby controller) is that I know of some content capable of producing that hash, and I'm registering it for the world to know about. Furthermore, I'm saving myself the step of doing this and then revoking a key.

Put another way, I could achieve the exact same outcome by doing this with did:sov (with some crafty extensions)

did:sov:123456789abcdefghi =>

{
 "@id": "did:sov:123456789abcdefghi",
 "publicKey": [
  {
     "id": "did:example:123456789abcdefghi#keys-1",
     "type": "Ed25519VerificationKey2018",
     "controller": "did:example:pqrstuvwxyz0987654321",
     "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
   }
  ],
 "knownContent": ["b85dca566725ca2d1baee467a13561af1346953a7bf281b1e259b172f5c740ab"]
}

and then revoking the key such that the did document looks like this:

{
 "@id": "did:sov:123456789abcdefghi",
 "knownContent": ["b85dca566725ca2d1baee467a13561af1346953a7bf281b1e259b172f5c740ab"]
}

What I understand this proposal is suggesting though is that I should be able to develop a did method to make the identifier the content and call it a day. If you really wanted to, I could resolve it and get a did document as well, but it's pretty useless. Albeit it's still a compliant did document.

In essence, what I'm suggesting is this identifier is cryptographic knowledge from the controller that they know the content that produces the hash. It still has a controller, who abandons control immediately, and are cryptographic mathematical guarantees asserted by the controller on initial generation of the did/did document.

@jandrieu what I understand you're suggesting is that the mental model should work in a way that's in-compliant with normative statements in the text currently. Specifically, the fact that every property of a DID Document other then @id MAY be used. Not MUST/SHOULD. If this is really the mental model we're going to role with, we should be changing a lot of the properties from MAYs to SHOULDs/MUSTs. Of particular note too, if we move to a SHOULD, I don't think that eliminates this usecase because it has "valid reasons in particular circumstances to ignore a particular item".

Even for fun, in the case of identifying when malware was first spotted in the wild, I could use the created property to have some metadata about the subject (specifically the first time it was spotted and registered).

e.g.

{
 "@id": "did:immutable:b85dca566725ca2d1baee467a13561af1346953a7bf281b1e259b172f5c740ab",
 "created": "2002-10-10T17:00:00Z"
}

Again, still a valid did document that complies with the mental model of a controller abandoning the did document.

pknowl commented 4 years ago

Thanks, @kdenhartog - I appreciate you walking through that proposed solution for us.

@peacekeeper @jandrieu - As to not disrupt current resolution processes, perhaps the DIDWG will concede to allow did:immutable: to become a new method type for immutable content with DID-document abandoned. Your thoughts?

Tagging a few people who may have opinions about this suggested way forward: @dlongley @msporny @burnburn @brentzundel @talltree @dhh1128 @ewelton @mitfik

ewelton commented 4 years ago

@kdenhartog I really like your idea above - to me, the genius of having the @context flexibility was that it moved the MUST/SHOULD model around verification methods (and service_endpoints) outside of the core spec (making the core spec extremely light and focused).

The ability of the definition of the data representation of the cryptographic material required for credential and capability processing to exist outside of the DID-spec would create a soft-coupling between DIDs and non-DID URI ids. With a common spec, external to the DID-core, that defined fields like capabilityDelegation and keyAgreement you have broad interoperability, not just internal to the DID universe but also with the world of credentials and capabilities in the large scale efforts actively being built outside of this community.

When viewed in that light, having the DID-spec be very crisp and lightweight, and simply identifying the relationship with some key-specs using an @context (or @context-like) switch - the door is open to easily defining semantics for 'found objects' that clearly express the appropriate model. You could have a 'discoverer' field for example.

You would not want to support 'mutable' data for such an object - definitely not service endpoints, nor cryptographic control material. But it still buys you a lot - having the ability to have a consistent, non-controller scoped name for an immutable digital asset, like a virus footprint would allow malware experts (organizations and individuals) to reliably associate information about how to fix, other ways to detect, and any other manner of derived data about the malware.

Issuers could link information directly about "malware footprint X" and not "controller Y's name for malware footprint X"- currently they could do that with urn:multihash:footprint but that just begs the question of resolution and metadata - where do you get that little hint that "this is a virus footprint, discovered by Y on DATE"? It is possible, but it strengthens the growing sense that DIDs are a fringe element in the world of credentials and capabilities.

In the world where DIDs remain central to credentials and capabilities, the ability to register "found objects" like virus footprints would open the world to subscribing - not to virus detection updates from specific companies based on SSL certificates at the end of https://malwarehelp.co/updates, but to a broader field of expertise (and perhaps more rapid response). You could choose to trust specific sources, and you could accept updates to your anti-malware software based on trust in issuers rather than trust that the SSL certificate and subscriptions.

The key is in the ability to issue a VC about "malware footprint X" instead of "controller Y's name for malware footprint X" - it takes the unwanted controller out of the loop, and does not introduce the need for a centralized malware database at the end of some DNS-mediated URL. This is a step towards Doc Searls' intention economy, and away from the customer-capture-and-control model that dominates the malware industry. It also helps reduce the inherent surveillance in pinging the update server.

The only thing that bothers me about the approach you describe above is the "surrender of control" being in good faith - unless I read that wrong or am lagging in my tracking the current edge of DID thinking. It is an area where KERI, and the transparency that comes with witness records, could record the completion of "surrender of control" - so you could work around the limitations of the DID spec and identify "immutable objects" - being those for which one could document the surrender of control. It would make the abstract data model and fixed semantics of the current DID-core more complicated, but that's what standard processing libraries are for.

One final thought about the benefits of prying the cryptographic material and service endpoints out of the DID-core spec, and into associated specs is that it would help address the tension of having resolution out of scope while the service_endpoints definition and URL discussions are in-scope.

With that lightening of the DID-core burden you get a very nice compartmentalization of concerns, and some excellent linkage with the broader world of identity, credential, and trust-technologies - the ones that will be in play when DIDs mature to the point of broad adoption.

But again, there is tremendous progress being made in the current DID-spec model - and we must take care not to derail that, even if it means that DIDs become a niche solution in the broader universe of URI mediated credential and capability processing.

burnburn commented 4 years ago

Wow, I must admit I am coming to this even later than @talltree and am astounded at the length and depth as well.

I'd like to make some comments at a meta level.

One of the prime killers of standards is scope creep. As one of the chairs, my job is to avoid that where possible. The challenge is that as others become aware of and join the work, it is natural for them to see ways in which they could make "just a small change" that would enable other use cases.

@jandrieu and @peacekeeper are correct in their description of the original intent of DIDs, the intent that spawned the incubation work and then the formal standardization track we are on. (And sorry, @talltree, but the notion of a DID document as something presented/generated rather than (necessarily) stored has existed since BTCR, the very first DID method)

@ewelton is correct that the latest reasonable time for a major perspective shift was when this group kicked off last September. Actually, even that was too late - charter development was the time.

However, I like what I am now seeing - an attempt to describe how to accomplish at least some of the "new" use cases presented in this thread via mechanisms already existing in and envisioned by the spec. This is how standards succeed, by accepting a functional if non-ideal use of the original idea. Remember, we can always do a 2.0 or create a completely different standard after learning through use of this one. In the mean time, an imperfect or limited standard that is COMPLETED is vastly superior to one that undergoes a revamp every year when new perspectives are added.

Soooo, I am not expressing an opinion, at all, on the value of the alternate perspective from @dhh1128 and @pknowl or the specific proposal from @kdenhartog . I am merely suggesting that creative directions like that suggested by @kdenhartog are typically more successful ways to get a new, potentially larger raft of use cases addressed because they dramatically limit scope creep and thus may allow us to finish this within a reasonable time span.

So keep talking!

pknowl commented 4 years ago

Thanks, @burnburn . Your post adds a wonderful sense of calm to this thread and gives us the necessary movement and oxygen to continue the discussion. At this stage, the argument is more philosophical than technological.

I strongly believe that a decentralized data network will be much more stable if we only have one identifier throughout the space - DIDs for everything. That would allow us to put all other identifier standards out to pasture with a graceful "thank you for getting us to this point."

I'm attaching a mini-deck so that everyone can visualise exactly what that means. DIDs for everything in the big blue circle. See first slide. Identifiers.pdf

talltree commented 4 years ago

@kdenhartog Hats off to you for a wonderfully simple twist to thinking about what @dhh1128 and others on this thread have been proposing. The way you describe it, did:immutable: actually strikes me to be much more like did:key, i.e., the contents of the DID document are cryptographically related to the DID itself.

I like it. In fact I like it so much—and it fits within the existing spec so easily—that I propose that one of the proponents write a PR.

pknowl commented 4 years ago

@dhh1128 Thanks for instigating a vibrant and much needed discussion.

@kdenhartog Thanks for coming up with a simple solution for us to get our teeth into.

@mitfik and I will start writing the did:immutable: method. Kyle has kindly offered to assist but would rather not be the sole maintainer (which makes total sense as this method really falls on the Decentralized Semantics side of the model). He has suggested that did:key: would be a great basic template to work from and that it might be worth getting the Protocol Labs folks (creators of multihash) to join in too.

We'll get something down in writing asap for review.

Thanks, everyone.