Proposed Appendices on DID Identification Architecture

w3c / did-core

W3C Decentralized Identifier Specification v1.0

https://www.w3.org/TR/did-core/

Other

395 stars 93 forks source link

Proposed Appendices on DID Identification Architecture #373

Closed talltree closed 3 years ago

talltree commented 3 years ago

Several current issues including #355 and #348 (and W3C CCG Security Vocab issue 45) depend on understanding how DID identification architecture works from a graph and semantic model standpoint. This is not currently explained anywhere in the spec. However it has been argued that this content is needed someplace in the spec in order to support a full understanding of how to use DIDs.

Since there isn't another section that feels like a natural home for this content (which is entirely explanatory in nature and contains several diagrams), it has been proposed to add it as one or more appendices to the spec. I have draft three such appendices for this purpose. I have not yet submitted them as PRs yet because it is a lot of content that is more easily reviewed and discussed first in a traditional document format. So all three are currently in a single Google doc. The direct links are:

I have attached this PDF of the current Google doc for anyone who does not have access to Google docs.

Although you can make comments directly in the Google doc (and some already have), it would be best for visibility, access, and archiving to carry on any substantial discussions here, in this issue thread. I will start by answering a few of the comments that have already been made in the Google doc.

Lastly, many thanks to @peacekeeper for helping to think through and review early drafts of this content. He and I have been struggling with these issues regarding Internet identifiers for 15+ years together, and on these questions he is wise way beyond his years.

talltree commented 3 years ago

The proposed Appendix A: What Does a DID Identify? includes this statement:

First, by definition, a DID always identifies an information resource—a DID document.

@burnburn made this comment in response:

DIDs do NOT identify a DID document. DID documents are information developed through resolution that are used in dereferencing.

In the Google doc I responded as follows:

@peacekeeper and I started down that path to resolve the non-information resource URI dilemma (the whole subject of the W3C TAB's 2007 Cool URIs for the Semantic Web document). However we ran into numerous issues with trying to avoid treating a DID document as an information resource.

The most intransigent of those was the fact that, by definition in RFC 3986, any URL that accepts a fragment is a primary information resource, and the fragment identifies a secondary resource (of some kind). We could not figure out any way around that. So in the end we found it easier to accept that the DID document is an information resource identified by the DID, and the DID document in turn functionally identifies the DID subject, as shown in Figure 1 of the proposed Appendix A (included below for easy reference).

talltree commented 3 years ago

A second statement included in Appendix A (right below the first one referenced above) is:

Secondly, although by itself the DID document is an information resource with one or more representations as defined by this specification, these representations are always a description of the DID subject. The DID document is never a representation of the DID subject.

@brentzundel made this comment in response:

I'm not sure I agree with this statement. For example, if I am the DID Subject, how is it that the DID Document (as a set of public keys and endpoints) describes me?

To which I posted this response:

The DID doc describes attributes of you (the keys) and methods of interacting with you (the service endpoints). This is especially clear when you look at the JSON-LD version of a DID document—it is an RDF (Resource Description Framework) graph describing you as the RDF subject.

agropper commented 3 years ago

Nice work!

I seek clarification on the situation where the DID enables a non-repudiable signature by the DID Subject. If the DID Subject is the sole DID Controller (Appendix B Set 1), no problem. However, if the DID is controlled by other or others (as in B Set 2 or Appendix C #1 or #3) then the DID Subject can repudiate their signature. Partial Aggregate Control (C#3) is a valuable feature for account recovery. What might we say about this?

Adrian

On Tue, Aug 11, 2020 at 8:11 PM Drummond Reed notifications@github.com wrote:

A second statement included in Appendix A (right below the first one referenced above) is:

Secondly, although by itself the DID document is an information resource with one or more representations as defined by this specification, these representations are always a description of the DID subject. The DID document is never a representation of the DID subject.

@brentzundel https://github.com/brentzundel made this comment in response:

I'm not sure I agree with this statement. For example, if I am the DID Subject, how is it that the DID Document (as a set of public keys and endpoints) describes me?

To which I posted this response:

The DID doc describes attributes of you (the keys) and methods of interacting with you (the service endpoints). This is especially clear when you look at the JSON-LD version of a DID document—it is an RDF (Resource Description Framework) graph describing you as the RDF subject.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/did-core/issues/373#issuecomment-672384576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YPGJKYEWY2KMBLPXETSAHMZ5ANCNFSM4P3X37MA .

kdenhartog commented 3 years ago

@talltree amazing writeup! While I'm sure there's parts of this that could be contentious in general I appreciate the effort you spent in writing and it seems to further clarify what I intuited about DIDs. Thanks for putting it all into such a well written document.

One of the stickier questions I've come across which I can't seem too wrap my head around is best put like this:

What happens if two independent controllers disagree on what the DID Document describes? I think this problem can be reduced to a more simple question as well. What happens when a DID Controller uses a DID Document to describe two independent DID subjects?

Similarly, I think this model further simplifies how this solution might work, but I'll still pose the second question (and attempt to answer it after).

how should the semantic graph be structured for something like the president of a university? For example, if the did document for DID:example:123 describes the president of ACME university, would it also equally describe the sitting president of ACME University? Or instead should they remain separate with unique identifiers while having overlapping control via the semantic graph?

Given that we have lineage this seems like a potential conflict (subject of did:example:123?version=1 is not equal to did:example:123?version=2) with rule one set by the TAB (described above as a URI identifies only a single information/non-information resource) if we allow the DID Document to describe the sitting president, but not if it were to describe the office.

For me this seems that the correct answer would be to structure it something like this:

did:example:123 identifies {... didDocument } describes office of the president of ACME University did:example:456 identifies {... didDocument } describes first sitting president of ACME University did:example:789 identifies {... didDocument } describes second sitting president of ACME University

and did:example:123?version=1 would look like:

{
  "id": "did:example:123",
  "authentication": ["did:example:456#key1"],
  "service": [ "did:example:456#contact-details"],
  ...
}

and did:example:123?version=2` would look like:

{
  "id": "did:example:123",
  "authentication": ["did:example:789#key1"],
  "service": [ "did:example:789#contact-details"],
  ...
}

In this way all URIs only identify a single non-information resource, while still keeping the most accurate model where the current sitting president controls the office of the presidency did document. Is that the correct way to solve for this?

talltree commented 3 years ago

One of the stickier questions I've come across which I can't seem too wrap my head around is best put like this:

What happens if two independent controllers disagree on what the DID Document describes? I think this problem can be reduced to a more simple question as well. What happens when a DID Controller uses a DID Document to describe two independent DID subjects?

@kdenhartog It's almost bedtime so let me just answer the first question you posed because that an easy one: a DID (and the associated DID document) can by definition identify exactly one DID subject. Period. Full stop. End of story.

Anything else is an error. If a DID controller claims that one DID identifies more than one DID subject, any requesting party should reject (and blacklist) that DID.

Let me clarify that it's fine for a DID to identify a group that contains two or more member entities. In this case the group is assigned its own DID, and each member entity in the group may or may not be assigned their own DID. But the group as an entity is its own DID subject.

Make sense?

(Your second question is a doozy and one I have to sleep on before trying to answer. But I'll point out that it is questions like these that are exactly why I think this set of appendices are needed.)

iherman commented 3 years ago

@talltree I have put some comments into the document, also echoing what @burnburn commented, on the section titled "How DID architecture addresses this challenge". Instead of putting a longer set of comment into the document, I jot down my thoughts... What about replacing the bullet items there by

A DID identifies the subject, which is always considered as a non-informational resource.
- Even if it refers to a Web Page, the DID itself does not refer to the content of the page but, rather, to the abstract notion of, say, somebody's home page.
The DID Resolution function of the corresponding method returns a description (or a representation) of the subject (commonly named DID Document, although maybe this is not the best term?).
- This is non unlike the second approach used in the Semantic Web world that you quote from the Cool URI-s Note that uses the HTTP 303: you dereference a URI and you get back a representation of the original resource.
DID URL-s can be derived from a DID that can be used to identify further informational resources with fragment ID-s and other things. Per definition, those URL-s that may refer to both informational and non-informational resources

I have the impression that this would simplify things.

Some notes:

Yes, in this description a DID Document does not have a URI. Is this a problem? I do not think so. A DID Document is just an artifact of the DID Resolution process after all.
- If we do need this a URI, we can use the (ugly!) DID URL did:example:123456#: after all, we define the syntax of the DID URL.
I hear your argument about RFC 3986 and fragment ID-s. The fact is: we are already violating that RFC by conceptually separating the notion of DID and DID URL-s (and that separation is necessary as we discussed before).
- Maybe (and I am stepping on lot of toes here, I know) we may have to make a syntactic differentiation here by using did: and didurl: to clearly separate the two notions from one another. (We could then say the URI of the DID Document is didurl:example:123456). But it may be too late for that.

Discussions to have... :-)

talltree commented 3 years ago

@iherman Thank you!! This is exactly the discussion I was hoping this draft of the appendices would produce. I am very open to each of your suggestions—ironically, what you suggest was the way I wrote the first and second drafts before @peacekeeper pointed out the RFC 3986 problem and we decided to treat the DID document as an information resource.

@peacekeeper, what do you think?

I invite others in the WG to read the documents and weigh in on your views on Ivan's proposal.

@iherman Are you going to be able to attend the special meeting at noon ET on Thursday? I really look forward to the discussion there.

iherman commented 3 years ago

@iherman Are you going to be able to attend the special meeting at noon ET on Thursday? I really look forward to the discussion there.

I plan to be there

iherman commented 3 years ago

I wanted to elaborate a bit on what I wrote at the end of my https://github.com/w3c/did-core/issues/373#issuecomment-672876140. I realize what I say here may be impossible to consider by now because it would force most implementations to change, but I think we should still consider this as a possibility. Here is what I meant:

A DID is a URI that identifies a non-information resource and only a non-information resource (abstract concept, person, a physical object, your dog, etc.). Its syntax is something like did:example:12345.
A DID URL is URL that refers (identifies?) to an information resource and only an information resource. Its syntax is something like didurl:example:12345/some/path?somequery#somefragment; this DID URL is uniquely associated with the DID did:example:12345. 'Uniquely' means that there is mapping from any didurl to a corresponding did by virtue of the method name and the method-specific ID.
Resolving a DID yields a DID Document; a DID Document is an information resource, identified by didurl:example:12345.
Fragments, relative URL, etc, are all meant to use the DID URL.

What this does is to reflect in the syntax the fact that we have two, very distinctive notions and many of the complications come from the fact that we try to hide the differences.

I realize that this distinction may sound very theoretical and therefore unnecessary for many, but looking at the complications in this Appendix trying to explain what happens shows that we may still want to consider this...

peacekeeper commented 3 years ago

@iherman said:

Yes, in this description a DID Document does not have a URI. Is this a problem? I do not think so. A DID Document is just an artifact of the DID Resolution process after all.

@burnburn said:

The primary resource is the did subject, and the default dereferencing action on that resource is to return the DID document. That does not make the DID document the resource.

I don't think we can be RFC3986 compliant if the DID document is not an information resource with its own URL, since RFC3986 says:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource.

Therefore, the DID document must be considered an information resource, and it must have its own URL.

At the same time, we want to maintain our core assumption that the DID identifies the DID subject (I agree that if we change this, a lot of things break).

Therefore, the DID document needs a URL that is not the DID (unless we want to accept that we are conflating things, which is what we have been doing so far).

peacekeeper commented 3 years ago

While working on the DID chapter for the Manning SSI book (see preview), @talltree and I discussed what a URL for the DID document could look like.

We considered adding a single fragment (e.g. did:example:123456#) which is interestingly also mentioned by @iherman in https://github.com/w3c/did-core/issues/373#issuecomment-672876140, but I don't think this would actually be RFC3986 compliant.

At that time I thought that adding a single slash (e.g. did:example:123456/) would make most sense.

During a series of DID Resolution calls last year, we discussed several more options, see here.

I think @iherman 's idea to introduce another URI scheme (e.g. didurl:example:12345) would also solve the dilemma, but I also agree it may be too late for that.

msporny commented 3 years ago

First, by definition, a DID always identifies an information resource—a DID document.

Ooof, this statement has incredibly far reaching implications... it's highly problematic for a number of reasons. I'm just going to point out a few below...

@burnburn wrote:

Don't agree with this. DIDs do NOT identify a DID document.

@iherman wrote:

I agree with Dan on this. A DID identifies, say, me, and I am not an informational resource. We define a non-HTTP mechanism that returns an information resource, a.k.a. DID document. That document does not have a URI by itself, it is the message that our mechanism conveys. We have a separate DID URL as a concept that allows for a fragment ID.

Agree with Dan and Ivan, everything was good right up to this point, and then it goes off of the rails. The issue is with the TAG finding... it presumes a lack of context when interpreting an identifier. It comes from time where there were no RDF Datasets and everything was being merged together in databases. That is, a time when context around information wasn't a core part of RDF and the Semantic Web. It is today, and that changes the TAG finding. That said, I have no interest in prosecuting this via TAG as the make up of the TAG has changed and they tend to not care about these sorts of things because they tend to not have a practical effect on the Web... yes, you can shoot yourself in the foot by doing the wrong thing, but you quickly find out that your foot is missing.

Pages 1-4 is a great backgrounder on where things were left almost a decade ago... Here be dragons, and we should leave those dragons alone.

We are squarely back in HTTP Range 14 territory...

https://en.wikipedia.org/wiki/HTTPRange-14

...which is a horrible place to be... because it's Computer Science catnip, where the conversation never terminates. :)

talltree commented 3 years ago

First, thanks for all the excellent feedback and suggestions. The DID WG had a very good special topic call about this today. In this post I'll first summarize my key takeaways and then provide a link to a revised proposal (here in this Google doc for the impatient ;-).

My key takeaways from today's discussion:

Everyone agrees we'd like to avoid falling into the HTTP Range 14 rathole. As Manu pointed out, the W3C Cool URIs for the Semantic Web document is 12+ years old, and much has evolved since then. Just as importantly, we want to stay focused on using DIDs to solve real market problems, not academic arguments.
I proposed that there were three options for a solution (and Manu added a fourth, below):
1. DIDs always identify the DID document as an information resource.
2. DIDs always identify the DID subject as a non-information resource.
3. There are two variants of a DID, one that identifies the DID subject as a non-information resource and another that identifies the DID document as an information resource.
Manu's fourth option was to not propose any solution and simply agree on the properties we need in DID documents to solve uses cases related to these topics.
Of the three options listed above, the strong preference of those on the call was for the second one (DIDs always identify the DID subject as a non-information resource) over the first one (DIDs always identify the DID document as an information resource).
Lastly, the third option listed above (two variants of a DID) would be ideal if we can figure out an easy way to do it.

Now, per these takeaways, on the call I shared that, while the original proposal was based on the first of the three options listed above, last night I prepared a revised proposal (of all 3 appendices) based on the second option (DIDs always identify the DID subject as a non-information resource). Since that appears to be the direction everyone wants to do, I suggest we shift further discussion to this second version. (Note that is a second Google doc—I have added a note to the first Google doc redirecting to the new one.)

Here is a PDF of that revised version for anyone who does not have access to Google docs.

Lastly, I will next write up next a possible answer that @peacekeeper and I discussed after the call about how we can deliver on the third solution above (two variants of a DID, one that identifies the DID subject as a non-information resource and another that identifies the DID document as an information resource).

talltree commented 3 years ago

Per my last comment, @iherman has proposed in this thread—and we discussed on today's special topic call—that the ideal solution is one where, for any DID, there are two variants, one that identifies the DID subject as a non-information resource and another that identifies the DID document as an information resource. This is literally how we could "have our cake and eat it too", i.e., use DIDs as abstract identifiers to solve a longstanding conundrum of the Semantic Web due that is very difficult if you have only concrete identifiers.

The challenge has been to figure out a solution to algorithmically relating those two variants of the DID that is also easy, intuitive, and practical for developers and implementers. While having two different scheme prefixes (did: and didurl: as Ivan suggests) would be very clean from a semantics standpoint, it might be very difficult from a practical ease-of-understanding and implementation standpoint.

So @peacekeeper and I put our heads to this and there is an option that's very simple to describe:

did:example:1234abcd <== DID that abstractly identifies a DID subject
did:example:1234abcd# <== DID that concretely identifies a DID document (as a whole)
did:example:1234abcd#foo <== DID that concretely identifies a secondary resource inside the DID document.

I can post a much longer justification of how this can be interpreted as consistent with RFC 3986, but first, let's just do a sniff test on this thread about how folks feel about this potential solution. (Feel free to just use a thumbs up or thumbs down.)

iherman commented 3 years ago

While having two different scheme prefixes (did: and didurl: as Ivan suggests) would be very clean from a semantics standpoint, it might be very difficult from a practical ease-of-understanding and implementation standpoint.

To avoid any misunderstandings: I acknowledge that. Personally, I still believe that this approach should have been the clean approach back when DIDs and DID URLs were conceived (I do not think that the implementation complications are that high) but this particular ship has probably sailed, and we do not want to force all method implementations to change on this issue.

For the three solutions quoted above: don't we have similar issues with the paths? I.e., if I want to add a x/y/z, the current syntax says:

did:example:1234abc/x/y/z

and, because the fragment id must be added to the end, we get:

did:example:1234abcd/x/y/z#foo

and we get to the same problem, don't we?

@peacekeeper referred to, in https://github.com/w3c/did-core/issues/373#issuecomment-673498896:

did:example:1234abcd/

wouldn't that be a better option? It makes it a bit awkward, e.g.,

did:example"1234abcd/#foo

but this is not unlike, say,

https://w3c.github.io/did-core/#did-subject

iherman commented 3 years ago

(This comment is on the revised Appendix A proposal.)

First of all, I like the approach and, personally, I believe this is a direction to go. My comments are on details an not a criticism of the document as a whole.

I am a bit bothered by the differentiation of seeOther and seeAlso. While the former is a property defined by us, the latter is one of the core RDFS properties, ie, it has already been defined. Its definition says:

rdfs:seeAlso […] is used to indicate a resource that might provide additional information about the subject resource.

A triple of the form:

S rdfs:seeAlso O

states that the resource O may provide additional information about S. It may be possible to retrieve representations of O from the Web, but this is not required. When such representations may be retrieved, no constraints are placed on the format of those representations.

Isn't exactly what we expect from seeOther? When using it we have an S (a DID) which refers to a non-information resource, and we refer to an additional description O, describing S. But isn't it what the definition above says?

My proposal would be to use rdfs:seeAlso on both places.

You say

The key difference between non-information resources and information resources is that only the latter can directly return representations. So if a DID subject is an abstraction of an information resource, the DID controller can use the representation property of a DID document to precisely map the DID to URIs for representations of the information resource.

And I guess I am not sure what "representations" mean here, and I believe we would have a hard time explaining this in the spec.

How would you explain, if did:example:abcd identifies me (and I am a non-informational resource), why exactly is the following wrong:

{
    "@context" : ""https://www.w3.org/ns/did/v1",
    "id": "did:example:abcd",
    "representation" : {
        "id" : "http://www.ivan-herman.net/professional/images/Ivan.Herman.png",
        "media-type" : "image/png"
    }
}

And how would you explain the difference between that and

{
    "@context" : ""https://www.w3.org/ns/did/v1",
    "id": "did:example:abcd",
    "rdfs:seeAlso" : {
        "id" : "http://www.ivan-herman.net/professional/images/Ivan.Herman.png",
        "media-type" : "image/png"
    }
}

In your case, the first approach is erroneous but only because you know (out of band) that did:example:abcd is a non-information resource. But that type of knowledge may be too much to ask from our user base, don't you think?

Are we overcomplicating things? What I am getting at, I guess, is that it may be much simpler to express everything with a single term, and that is rdfs:seeAlso. It may not be 100% clean but it may do the job...

msporny commented 3 years ago

@iherman wrote:

Are we overcomplicating things? What I am getting at, I guess, is that it may be much simpler to express everything with a single term, and that is rdfs:seeAlso.

Agreed with @iherman here. I also don't think the media type is required, you typically content negotiate for that. So, that would make the solution look like this:

{
    "@context" : "https://www.w3.org/ns/did/v1",
    "id": "did:example:abcd",
    "seeAlso" : "http://www.ivan-herman.net/professional/images/Ivan.Herman.png"
}

... and really, we should be using more specific relationships like this:

{
    "@context" : ["https://www.w3.org/ns/did/v1", "https://schema.org/"],
    "id": "did:example:abcd",
    "url" : "http://www.ivan-herman.net/",
    "seeAlso": "https://www.linkedin.com/in/iherman/",
    "image" : "http://www.ivan-herman.net/professional/images/Ivan.Herman.png"
}

... and really, that is a massive GDPR violation if placed on a Verifiable Data Registry. What you should be doing instead is publishing "url", "seeAlso", and "image" via a Verifiable Credential served up via a service endpoint or other means, like this:

{
  "@context": ["https://www.w3.org/2018/credentials/v1", "https://schema.org/"],
  "type": "VerifiableCredential",
  "issuer": "did:example:abcd",
  "issuanceDate": "2020-01-01T19:73:24Z",
  "credentialSubject": {
    "id": "did:example:abcd",
    "url" : "https://www.ivan-herman.net/",
    "seeAlso": "https://www.linkedin.com/in/iherman/",
    "image" : "https://www.ivan-herman.net/professional/images/Ivan.Herman.png"
  },
  "proof": {
    "type": "Ed25519Signature2018",
    "created": "2020-01-01T19:73:24Z",
    "proofPurpose": "assertionMethod",
    "verificationMethod": "did:example:abcd#keys-1",
    "jws": "eyJhbG...PM"
  }
}

talltree commented 3 years ago

After a long hiatus to consolidate all the feedback, I have produce a new version of these Appendices. Appendix A was rewritten completely to reflect the consensus of the DID WG. The only changes needed in Appendices B and C were tweaks to the diagrams to align with the revisions in Appendix A.

These are still Google docs until we have one more round of review from WG members, then I will turn them into PRs. The direct links to the Google docs are (note that these are NEW links):

I have attached this PDF of the current Google doc for anyone who does not have access to Google docs.

Although you could make comments directly in the Google doc, please put them in this issue thread instead.

msporny commented 3 years ago

Appendix A: What Does a DID Identify?

Since a DID is a specific type of URI, the answer to this question is provided by section 1.1 of the URI specification (RFC 3986):

Don't quote directly, link to the section of the specification and summarize.

This mechanism is how DID identification fulfills a longstanding recommendation from the W3C

Again, quoting is problematic as guidance has changed throughout the years -- we should link to the text.

Semantically, it is recommended to identify this type of resource using the type property of the DID document.

This text should be in the section on type.

https://schema.org/person

Should be capital 'P' -- https://schema.org/Person

Even better, the author could also create a DID for the web page.

This is a bit strange... it's correct, but may be viewed as jumping the shark as they could just express that home page as a relationship to their (the author's) DID.

Overall, very minor nits, the overall direction and content is solid. Thank you for putting this together, Drummond!

The "don't quote other specs" is just a warning, not going to stand in the way if others want to do that -- I agree that it improves readability to have things inline. The introductory parts about type really should go into the type section... possibly the whole thing?

msporny commented 3 years ago

Appendix B: DID Controllers and DID Subjects The relationship between DID controllers and DID subjects can be confusing.

I wouldn't start out by suggesting that it's confusing... it's not really, reword to summarize the section in the introduction.

(in RDF/OWL, this is expressed using the owl:sameAs predicate) there is no owl:sameAs arc

Don't bring OWL into this... let's try to avoid it entirely, you don't need it to talk about the concept... talk about the concept in a more general sense without binding it to OWL Semantics.

msporny commented 3 years ago

Appendix C: Multiple DID Controllers

Controls ->

Hmm, just noticed something about the arcs -- the actual arcs for Controls go from the DID Document to the DID Controller node, not the way it's drawn in the document now. I don't know how important/pedantic we want to be about that is... but it could trip people up, because we do have a controller property in the DID Document.

cryptographic multisig algorithm

Change to "A cryptographic algorithm that requires multiple digital signatures." for those that don't know what a multisig is...

when using an m-of-n cryptographic signature algorithm

Change to "when using a cryptographic algorithm that requires a threshold of multiple digital signatures."

only one DID controller may be the target of an RDF/OWL sameAs arc from the DID subject as shown in Figure A2.1 of Appendix B.

Avoid OWL, no need to open that can of worms... the concept stands on its own without pulling in OWL.

msporny commented 3 years ago

Overall, great work @talltree -- just minor nitpicks, almost all editorial, feels like it's ready for a PR. Have you considered moving this into a separate document? I'm on the fence about it. It adds a significant amount of content to the core specification... making it longer and feel more heavyweight... on the other hand, these are some core concepts that folks have been getting tripped up on for a while.

Feels akin to this section -- which, I argue should never have gone into the VC spec: https://www.w3.org/TR/vc-data-model/#subject-holder-relationships

iherman commented 3 years ago

Appendix A, In bullets after A1.1,

it says:

The type property describes the nature of the DID subject (person, organization, book, web page, data structure, abstract concept, etc.)

This actually raises a major modeling issue.

In the current setting we have (I put it in Turtle, because I find it cleaner from a modeling point of view):

<did:xyz:12345> 
    rdf:type <https://schema.org/Person> ;
    :alsoKnownAs <https://www.example.org/ASDFGH/> ;
    :controller <did:abc:WERTYUI> ;
    :authentication [ ... ] .

this is in line with what you write, and it is a translation to the RDF model of all the examples we have in the document. This is because we use the id property to identify the subject. But this also means is that all properties that we define in Core can be applied to any resource in the world! Ie, we cannot use any restriction, whether in OWL or SHACL or anything because that those statements are made, formally, on the DID subject. Is this really what we want?

Also, what is, in the Semantic Web sense, a DID Document? It is not really a resource because we use the id property to identify it and that property is meant to identify the subject! The subject is not the DID Document, is it?

In my comment on §5.1 in #401 I already expressed a bad feeling about the structure, but it is reading your document that I realize that we may actually be on a wrong path (and it is not only an editorial issue as I said there). I believe the right way of modeling what we want is, rather:

[
    rdf:type :DIDDocument ;
    :alsoKnownAs <https://www.example.org/ASDFGH/>
    :subject <did:xyz:12345> ;
    :controller <did:abc:WERTYUI> ;
    :authentication [ ... ] .
]

and if we want to express what you said we can then do:

[
    rdf:type :DIDDocument ;
    :alsoKnownAs <https://www.example.org/ASDFGH/>
    :subject <did:xyz:12345> ;
    :controller <did:abc:WERTYUI> ;
    :authentication [ ... ] .
]
<did:xyz:12345> rdf:type <https://schema.org/Person>

I used a blank node for the DID document. Which is fine, because that identifier does not really play an important role. We could, of course, do something like:

<SomeURIHere>
    rdf:type :DIDDocument ;
    :alsoKnownAs <https://www.example.org/ASDFGH/>
    :subject <did:xyz:12345> ;
    :controller <did:abc:WERTYUI> ;
    :authentication [ ... ] .

but that is not really of importance.

If we translate this back to JSON-LD, because that is what we want to use, we would get

{
    "type" : "DIDDocument",
    "alsoKnownAs" : "https://www.example.org/ASDFGH/",
    "subject" : "did:xyz:12345",
    "controller" : "did:abc:WERTYUI",
    "authentication" : { ... }
}

and, respectively

{
    "type" : "DIDDocument",
    "alsoKnownAs" : "https://www.example.org/ASDFGH/",
    "subject" : {
        "id" : "did:xyz:12345",
        "type" : "https://schema.org/Person"
    },
    "controller" : "did:abc:WERTYUI",
    "authentication" : { ... }
}

The important difference here is that we use the subject term which is different than the id. This is a minor difference for the user (and indeed the JSON version) but makes all the difference from a modeling point of view!

Cc @msporny @dlongley

iherman commented 3 years ago

B.t.w., just an additional comment to the previous, inspired by the text in Appendix B. If I take the current:

{
    "id" : "did:xyz:12345",
    "controller" : "did:abc:WERTYUI"
}

This translates, in Turtle, to:

<did:xyz:12345> :controller <did:abc:WERTYUI>

in English it would mean something like the controller controls the subject. This is not what want to say, do we? Instead, if we have (adding an ID to the DID document just to make it clearer):

{
    "id" : "urn:something",
    "subject" : "did:xyz:12345",
    "controller" : "did:abc:WERTYUI"
}

we get:

<urn:something>
    :subject <did:xyz:12345> ;
    :controller <did:abc:WERTYUI> .

which says that the DID Document, identified by urn:something is controlled by did:abc:WERTYUI. Which is what we want, right?

iherman commented 3 years ago

Avoid OWL, no need to open that can of worms... the concept stands on its own without pulling in OWL.

+1 to that...

jandrieu commented 3 years ago

I posit that there is no English sentence using the controller property that sounds correct.

To push back on @msporny's response to @iherman, the Controller has nothing to do with the "controller" property. @msporny said

don't know how important/pedantic we want to be about that is... but it could trip people up, because we do have a controller property in the DID Document.

The "controller" property isn't about the controller at all.

It isn't about who controls the subject. Nor is it about who controls the DID. It is about where else a requesting party might look for verification methods that should be accepted as if they were included in this document.

It's probably more appropriate to have something like "include" or "extended".

This confusion by @msporny is an excellent example of why we should rename that property.

jandrieu commented 3 years ago

Perhaps more salient, this "type" description that Ivan quoted seems to be a conflation of layers and should be removed:

The type property describes the nature of the DID subject (person, organization, book, web page, data structure, abstract concept, etc.)

First, on privacy grounds, the DID Document should never describe anything about the subject other than how to interact securely with parties acting on its behalf. The type value at the root level should NEVER attempt to say anything other than specify that the Subject is a DID Document Subject. Which is already true, so "type" is likely unnecessary as the @context value already identifies the meaning of the core properties.

Statements about a particular Subject, such as their nature as a person, organization, book, web page, etc., are better stated in verifiable credentials where the author can be explicitly identified. In the DID Document, who is it that verified the particular type? Is this just a self-asserted statement by the controller? If so, make that clear by making that statement in a VC. If the determination of type has been made by someone else, then that someone else needs to be identified.

As such, I would suggest a change to the following:

Each normative property in a DID document is a statement by the DID controller describing the DID subject

Each normative property in a DID document is a statement by the DID controller describing how to interact with parties acting on behalf of the DID subject

Second, the DID Spec has no mention at all of "type" used at the top-level of the document. This is just not something the DID Document specification. It is not a normative property, so it should just be removed from those bullet points after A1.1. It is a normative property of various other properties in the document, but it is not at the top-level and hence, not a predicate about the Subject.

Third, because type is a privacy problem, I would go so far as to say that in the spec we should clarify that "type" SHOULD not be a top-level property of a DID.

rhiaro commented 3 years ago

"type" : "DIDDocument",

@iherman Now I'm confused! I thought we'd got past this, and there is consensus that the DID identifies the DID subject; the DID subject is always the subject in RDF triple terms. The DID document is an abstract concept for describing the set of properties that result from the DID resolution process, but it's not a thing in itself.

iherman commented 3 years ago

Well..., I got into issues with that. If we continue to use id as identifying the subject, then, from an RDF point of view, we end up making a bunch of statements (like the controller) on that subject and not on the did document (abstract concept or not). As i said in https://github.com/w3c/did-core/issues/373#issuecomment-696016837, in fact, the statement reads as if the controller controls the subject (e.g., I control my son if he is the subject) instead of controlling the DID Document on my son. I am not sure my son would like that...

jandrieu commented 3 years ago

The semantics are slippery. And this slips right into http range 14.

The id is intended to refer to a Subject. The problems are

It's an anti-pattern to put arbitrary statements about the subject in the DID Document.
The semantics of many of the properties are not described in a way that rigorously distinguishes between the Subject and the meta-data for interacting on behalf of that subject.

The controller property is one of those. It simply isn't readable to say "the subject identified by the 'id' value" is controlled by/under the control of/has the controller of "the value of the 'controller' property".

That we are really saying is something like, the Subject as identified by the ID value may have verification methods in another DID Document which should be accepted as if they were in this DID Document.

Similarly for "authentication" and "service":

"services: "the Subject referred to by the ID in this DID Document, and its authorized agents" "may be engaged through the following services" or

"authentication": "the Subject referred to by the ID in this DID Document and its authorized agents" "may be authenticated through the following mechanisms"

The English makes no sense to say (in transliterated form) "the Subject referred to by id" "authentication" "authentication mechanisms"

Nor "the Subject referred to by id" "service" "service endpoints".

In section 5.5 https://w3c.github.io/did-core/#service-endpoints it says:

Service endpoints are used in DID documents to express ways of communicating with the DID subject or associated entities.

So, we acknowledge that the service endpoints may not be about the Subject, but we do so in a way that is ambiguous. If they aren't about the Subject, then why is the RDF subject "id"?.

Our conversations built on RFC3986 and its discussion of resolution:

URI "resolution" is the process of determining an access mechanism and the appropriate parameters necessary to dereference a URI;

In this context, EVERYTHING in the DID Document is meta-data for determining an access mechanism and appropriate parameters for dereferencing the URI (and eventually interacting with something on behalf of the user).

As such, IMO, all of the top-level predicates / properties, should be defined with the disposition described above. Today, we are exceptionally ambiguous--and often inconsistent--in how we talk about the meaning of these properties.

talltree commented 3 years ago

The important difference here is that we use the subject term which is different than the id. This is a minor difference for the user (and indeed the JSON version) but makes all the difference from a modeling point of view!

@iherman Just wanted to let you know that you raise some totally fascinating points. I need to fully grok this before responding. As you say, this affects the proposed type property too.

iherman commented 3 years ago

Note the question of @melvincarvalho in https://github.com/w3c/did-core/issues/413#issuecomment-696740868: I believe he has hit exactly the same questions/issues as I raised in https://github.com/w3c/did-core/issues/373#issuecomment-696012369.

We also have another source of confusion, namely that the concept of a DID subject is not the same as the concept of a subject in an RDF triple (though related).

melvincarvalho commented 3 years ago

If the term you describe above :subject exists, then I think that answers my question

iherman commented 3 years ago

If the term you describe above :subject exists, then I think that answers my question

It doesn't at the moment... it is a question to discuss.

melvincarvalho commented 3 years ago

If the term you describe above :subject exists, then I think that answers my question

It doesn't at the moment... it is a question to discuss.

Got it!

I like the sound of :subject but completely understand if it doesnt make it into DID 1.0. Or if it's consensus that a DID Document has no URI then it's not needed

Also we might not need an extra solution here, given that DID already uses schema.org. Perhaps it's simply possible to use:

https://schema.org/mainEntity

Instead of :subject

However, if indeed DID Documents cannot have a URI it was not immediately obvious, so that could be perhaps stated in the appendix

talltree commented 3 years ago

@melvincarvalho Your comments amplifying those of @iherman are deeply appreciated—some of us on the DID WG have been discussing this semantic modeling question for many months (which is why I've actually written four different drafts of Appendix A).

What we ended out deciding was that the DID document did not have a separate URI because, as @rhiaro has pointed out in #413, the DID document is a collection of properties describing the DID subject that can be obtained via DID resolution.

This is also why we are adding a type property for all the standard reasons one might want to describe the type of the DID subject.

What @iherman pointed out—and I think he's absolutely correct—is that once we arrived at this conclusion, we need to be rigorous that all of the properties describe the DID subject. created and updated—which clearly described the DID document and not the DID subject—were moved out of the DID document and into the resolution metadata for that very reason.

So the one property I'm aware of that remains in question, as @iherman suggests, is controller. I agree that as currently named, it does not describe the DID subject, rather the controller of the DID (which may or may not be the DID subject as Appendix B explains). So I suggest we solve this problem by changing the name of the property to the term we defined in our terminology section, i.e., DID controller. So then the triple would be:

<did:foo:1234> : didController <did:bar:9876>

Lastly:

However, if indeed DID Documents cannot have a URI it was not immediately obvious, so that could be perhaps stated in the appendix

That's an extremely good point and one which I will make much clearer in my fifth revision.

iherman commented 3 years ago

@talltree I am still a bit worried that, as @msporny put it in another comment, there may be lots of dragons out there even if we rename the controller property. Actually, I do not know whether just renaming would cut it: if I say

<did:foo:1234> :didController <did:bar:9876>

that still means that <did:bar:9876> controls (o.k., in the DID sense) <did:foo:1234>. However, this is what the current spec says about the controller:

Authorization is the mechanism used to state how operations are performed on behalf of the DID subject. A DID controller is authorized to make changes to the respective DID document.

(Emphasis is mine). This is a case for “If it looks like a duck, swims like a duck…”, ie, that the controller controls the DID document not the DID subject!

I could tweak my mind and accept that the other properties, like verificationMethod or authentication are, sort of, attached to a DID subject indeed.

Also, if we "just" use the RDF triple, then it does not mean any sense to restrict the value of id. In terms of RDF it should be perfectly fine to say:

<https://www.ivan-herman.net/#me> :didController <did:bar:9876>

As I said, lots of dragons there...

dlongley commented 3 years ago

The controller property allows a DID controller to express other parties that may have verification methods (in their DID Documents) that SHOULD be accepted as authoritative, such that proofs that satisfy those verification methods are to be considered equivalent to proofs provided by the DID Subject. These other parties may therefore generate cryptographic proofs on behalf of the DID subject -- and it is in this sense that they may act on behalf of the DID subject, thereby directing certain behaviors.

DID methods that use verification methods to enforce their rules may use the controller field as the DID controller (and some do, like Veres One). This is not strictly necessary, however, as these rules may be enforced another way. The term can be more widely used by any applications (not just DID method enforcement tools) that use verification methods.

I don't think there's a problem with the term as-is.

iherman commented 3 years ago

The controller property allows a DID controller to express other parties that may have verification methods (in their DID Documents) that SHOULD be accepted as authoritative, such that proofs that satisfy those verification methods are to be considered equivalent to proofs provided by the DID Subject. These other parties may therefore generate cryptographic proofs on behalf of the DID subject -- and it is in this sense that they may act on behalf of the DID subject, thereby directing certain behaviors.

@dlongley I do not have any problem with what you say. My discomfort comes from the way all this is expressed in RDF which seems to be odd in my view (see my examples above) and which does not seem to be in line with the way it is specified in the spec either.

talltree commented 3 years ago

@iherman I thought about this further and IMHO the controller property (or whatever we ultimately decide to name it) expresses a relation to the DID subject much like any other relationship would be expressed. For example:

<did:foo:1234> : mother <did:example:9876> <did:foo:1234> : employer <did:example:abcd> <did:foo:1234> : guardian <did:example:defg> <did:foo:1234> : controller <did:example:jklm>

In my mind, this makes it even clearer that a controller relationship is very similar—in some cases—to a legal guardian relationship (which in fact plays a hugely important in self-sovereign identity—see this white paper).

This also fits with the clarifications I've been discussing with you and @melvincarvalho about:

The DID document not having a separate URI.
All properties in the DID document being properties describing the DID subject. Every one of them would simply be an RDF triple (or, for more complex properties, and RDF graph) whose RDF subject is the DID subject as identified by the DID (as a valid URI).

You are the much deeper RDF expert—do you see any issue with this?

iherman commented 3 years ago

@iherman I thought about this further and IMHO the controller property (or whatever we ultimately decide to name it) expresses a relation to the DID subject much like any other relationship would be expressed. For example:

<did:foo:1234> : mother <did:example:9876> <did:foo:1234> : employer <did:example:abcd> <did:foo:1234> : guardian <did:example:defg> <did:foo:1234> : controller <did:example:jklm>

In my mind, this makes it even clearer that a controller relationship is very similar—in some cases—to a legal guardian relationship (which in fact plays a hugely important in self-sovereign identity—see this white paper).

This also fits with the clarifications I've been discussing with you and @melvincarvalho about:

The DID document not having a separate URI.

All properties in the DID document being properties describing the DID subject. Every one of them would simply be an RDF triple (or, for more complex properties, and RDF graph) whose RDF subject is the DID subject as identified by the DID (as a valid URI).

From an RDF point of view my only reservation is my last remark about the restriction on the URI used for a DID subject:

Also, if we "just" use the RDF triple, then it does not mean any sense to restrict the value of id. In terms of RDF it should be perfectly fine to say:
<https://www.ivan-herman.net/#me> :didController <did:bar:9876>

I do not see any ways to properly restrict the URI format used for a subject within the RDF world. In other words, this type of restriction is "out of bands", somehow. I am not sure how to handle that.

If we go that way (and let us say we find a solution or we disregard the URI restriction issue above) then the spec text has to be clarified because, as of now, it really reads that the "control" is on the DID Document.

I also think that if we go this way, we would not need some sort of a generic RDF type of the sort DIDSubject, at least for the RDF side. I think this is necessary for a proper vocabulary (e.g., setting the right domain for a property like authentication) as well as for the constraints that we may want to express via SHACL (e.g., that the value of the controller property MUST be a DID). We are talking about using the type explicitly anyway, and this would be some sort of a "default" type.

(I plan to review my vocabulary and SHACL files anyway because I found some bugs, so I will try to make these things more explicit.)

dlongley commented 3 years ago

@iherman,

I do not see any ways to properly restrict the URI format used for a subject within the RDF world. In other words, this type of restriction is "out of bands", somehow. I am not sure how to handle that.

If we go that way (and let us say we find a solution or we disregard the URI restriction issue above) then the spec text has to be clarified because, as of now, it really reads that the "control" is on the DID Document.

I also think that if we go this way, we would not need some sort of a generic RDF type of the sort DIDSubject, at least for the RDF side. I think this is necessary for a proper vocabulary (e.g., setting the right domain for a property like authentication) as well as for the constraints that we may want to express via SHACL (e.g., that the value of the controller property MUST be a DID). We are talking about using the type explicitly anyway, and this would be some sort of a "default" type.

(I plan to review my vocabulary and SHACL files anyway because I found some bugs, so I will try to make these things more explicit.)

I think the "RDF subject value must be a DID" is a restriction that only applies to the DID Document, when you are talking about the DID subject. It is not an RDF vocabulary restriction and it is not specific to "controller". It is a statement to help producers of DID Documents create them properly without having to understand more details about RDF. You must use a DID in the subject position for statements made about the DID subject. It is not a statement about valid RDF domains for RDF predicates. The predicates mentioned may be used elsewhere when not talking about other RDF subjects (and the RDF subject values needn't be DIDs).

I tried to clarify this with an additional sentence in the spec here:

Even though JSON-LD allows any IRI as node identifiers, DID documents are explicitly restricted to only describe DIDs. This means that the value of id that refers to the DID subject MUST be a valid DID and not any other kind of IRI.

https://w3c.github.io/did-core/#json-ld

So this is the "out of bands" restriction you mentioned -- which is really just instructing producers to use the DID in their statements in DID Documents when they want to talk about the DID subject.

iherman commented 3 years ago

@dlongley, we define an RDF vocabulary with terms like controller, authentication, etc. This means that somebody in the LD community may come up with the following RDF graph:

<https://www.ex.org/somebody> 
    did-vocab:controller <did:ex:12345> ;
    did-vocab:authentication [
        ....
    ]
.

Whatever tool we use (OWL, SPARQL, SHACL, whatever) this graph will be a valid "DID graph", whether it is encoded in JSON-LD or not. What we say is that this is not a valid set of statement from a DID processing point of view. What I am looking for is a proper statement, note, whatever somewhere in the spec that makes it clear that this is a restriction on an RDF graph that cannot be expressed via the standard RDF(S)+OWL+... toolset.

Your statement above is fine for spec text, I am looking for some extra notes there. And we may have some hard-core LD people (not me! :-) raise their eyebrows...

peacekeeper commented 3 years ago

I have to say that semantically, <did:foo:1234> :controller <did:bar:9876> has always bothered me a bit.

We want to express that someone else controls my DID document or acts on my behalf, but I hope we don't want to say that e.g. one individual controls another individual.

Maybe "controller" should go into DID document metadata rather than the DID document.

iherman commented 3 years ago

Maybe "controller" should go into DID document metadata rather than the DID document.

we should also remember that the term controller is used (at least in the examples, although it is not formally said in the spec!) for verification methods, too. For that second case it creates no discernable semantic problems. If the "main" usage moves to the DID document metadata, then the verification method case must be defined separately.

talltree commented 3 years ago

have to say that semantically, <did:foo:1234> :controller <did:bar:9876> has always bothered me a bit.

We want to express that someone else controls my DID document or acts on my behalf, but I hope we don't want to say that e.g. one individual controls another individual.

Maybe "controller" should go into DID document metadata rather than the DID document.

@peacekeeper Did you see my post earlier where I described that controller is just a type of relationship like any other relationship that might exist between two RDF subjects? For example, : father, : employer, : guardian ?

Let me give an even more specific example:

<did:foo:1234> : power-of-attorney <did:example:jklm>

A cite this example because power of attorney is in fact very similar to what the DID spec means by "controller", especially in a legal sense. By that I mean:

When one person gives another person power of attorney, the second person can perform certain specific acts on behalf of the first person. But that's all.
When a DID subject that is a person designates another person a controller, the controller can perform certain specific acts on behalf of the DID subject. But that's all.

Of course, the DID subject may not be a person. Say it's a drone. Now it seems even more natural to say:

<did:foo:1234> : controller <did:example:jklm>

My point is: the fact that the controller relationship means that the controller can update the DID document does not mean controller is metadata about the DID doc like created or updated, which clearly describe the DID document and not the DID subject. Rather controller expresses a relationship between two DID subjects. And the DID doc is never the DID subject.

OR13 commented 3 years ago

kdenhartog commented 3 years ago

Tl;DR: Change controller to actAs to more generally represent the relationship between an RDF subject and RDF object if the property remains in the DID Document rather than moving to the metadata about the DID Document.

From reading through this, I'm thinking that the issue here is in the semantics of agency that's imbued by our choice of controller as the predicate in the RDF triples.

In the examples that @talltree provides above they all share the common pattern that the RDF subject acts on behalf of the RDF object in some manor expected by the object. However, due to the varying nature of the entities that can be RDF objects the term controller doesn't fit because of our ethical interpretations between entities that have agency (e.g. Individual), entities that relinquish agency to a fiduciary (e.g. children/customer), and entities that cannot have self agency (e.g. IoT device or organizations). For this reason, it's my thinking that the problem is the predicate term we've chosen is misrepresenting the relationship between the RDF subject and the RDF object and that's what's causing the confusion around if the RDF object could be the DID Document rather than the expected did subject (did subject as the RDF object in the RDF triple).

I'm thinking that by changing the RDF predicate from controller which comes with weird interpretations depending on the entities chosen (e.g. schema:Person controller schema:Person seems ethically dubious) to actAs it would allow us to cover all the varieties of RDF triple combinations that could be filled by the various pairs of entities. To me it seems that we keep the interpretation that the DID represents the DID Subject not the DID Document, don't overstate the representation of agency between generalized pairs of entities, and also keep clean semantics when using the RDF triple to describe authorization like is commonly the case for the controller property today.

Additionally, the actAs predicate also comes with some interesting advanced cases that seem to fit well in places where controller feels way off.

For example let's say that is an IT Admin (did:example:123) acting on behalf of corp A (did:example:456) and corp A owns an IoT Sensor (did:example:789) that publishes data.

Now let's say that corp A has a policy that the IoT device needs the admin to rotate the keys of the IoT Device every 12 months. In the current case, what we're saying is that the did:example:123 controller did:example:456 controller did:example:789

In this sense it seems fine, to use the term controller because it accurately represents the relationships between the pairs of entities and by extension the controller predicate accurately represents the transitive relationship between the IT Admin and the IoT device.

However, imagine we have the case where a lawyer (did:example:321) at large law firm A (did:example:654) represents a client (did:example:987).

In this case, the predicate controller implies that the lawyer representing the law firm is actually in control of the client which doesn't semantically equate to the realistic model we'd expect in the real world. "actAs" would cover both of these more advanced cases as well as the ones suggested above by Drummond.

OR13 commented 3 years ago

-1 to changing the word controller, although I do agree that RDF is playing a critical role here, I am against changing terminology that is already used in the VC Data Model as well as Linked Data Proofs and many DID Methods.

kdenhartog commented 3 years ago

-1 to changing the word controller, although I do agree that RDF is playing a critical role here, I am against changing terminology that is already used in the VC Data Model as well as Linked Data Proofs and many DID Methods.

I agree it's getting a bit late to change properties at this point. Just wanted to point out the reason the predicate feels incorrect is because it is incorrect in certain scenarios.