w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
397 stars 94 forks source link

Does DID Document metadata belong in the Document? #65

Closed dmitrizagidulin closed 3 years ago

dmitrizagidulin commented 4 years ago

Does metadata about the DID Document (such as when it was created, updated, or who it was signed by) belong in that DID Document?

Note that this question is not about a) the metadata for the subject of the DID (keys, service endpoints) or b) the metadata about the resolution of a particular DID Document (proof added by a resolver, caching data, what servers/nodes were used for resolution) -- that belongs either in the Resolver metadata or Method metadata sections.

So far, there have been arguments both for and against placing this metadata in the DID Document itself (vs outside of it, say in the Resolver metadata sections).

A) This metadata is already in the registry

A - against: Since much of this metadata (specifically, the created and updated timestamps and the proof which includes authorship metadata and document integrity protection) will also likely reside in the underlying DID registry mechanism (distributed ledger, etc), a Resolver should be able to figure out this data from the registry, and include it in the resolution metadata.

A - for: In many (most?) cases, these are two separate sets of metadata - one about the document itself, and one about the underlying registry mechanism.

Also: The DID Document should be self-contained, in terms of critical metadata, in case it is archived or otherwise separated from its underlying ledger or storage medium.

B) Potential for developer confusion

B - against: If the DID Doc metadata (such as when the document was created) differs from the did registry metadata (when the document was registered on a ledger, for example), this may confuse developers.

B - for: @TallTed

You want to talk about "confused developers"? Check out "last accessed", "last modified", and "created", among other Unix-y timestamps attached to documents in Unix-y filesystems.

In other words, these two categories of metadata are separate, and developers constantly have to keep this difference in mind anyway.

C) Use cases

C - against: There are no use cases currently for this metadata. (Or, the use cases are unclear.)

C - for: There are use cases -- this topic is highly relevant to any DID registry using a mutable storage mechanism, such as the BTCR mutable extension documents or did:web method documents.

Also, as @peacekeeper points out:

Perhaps the strongest argument for a proof on a DID Document is to link DIDs to already existing PKI such as X.509 or the E.U.'s eIDAS infrastructure. You could include an eIDAS signature (this is called "eSignature" or "eSeal") on a DID Document to link the DID to a legal identity.

D) Offload this topic to DID method specific specs

D - against: Even if this metadata does belong in the DID Document, perhaps we should hand this off to each DID method to decide (rather than the main DID spec).

D - for: @ChristopherA

However, if there are any other DID methods that use mutable storage for DID Documents, they would need to solve the same problem we do, and they might do it different ways which could be good (for innovation) or bad (for security if they don't understand it well as our scenario is complicated).

In other words, this is going to be a common enough problem that we should address this in the main spec.

E) Conceptual elegance

E - against: @dlongley:

I want to point out that the way that we've avoided the HTTP-Range-14 argument (which we should absolutely continue to do) is by deciding that you can, for most practical purposes, conflate a DID Document and a DID subject (they have the same identifier). There's a danger that we may lose this simplicity by encouraging expressing information in a way that stretches the limits of that conflation.

E - for: ... an excellent point. Perhaps we can continue to benefit from this conceptual simplicity (of having the DID Doc be mostly about the DID subject) by making it clear via the attribute names that the metadata is about the doc, not the subject? Like, having the field be named docCreated instead of just created, to prevent ambiguity?

dlongley commented 4 years ago

A DID Document is a graph of information. That information is primarily about the DID subject. If we want to make statements about the graph itself, those statements do not belong in that very graph. There may perhaps be exceptions we can make for things like proof that get special treatment, but we should otherwise avoid this. If the statement we want to make can be reasonably understood to apply to the DID subject then we can put it in the graph. This gives us wiggle room to avoid the http-range-14 problem.

iherman commented 4 years ago

This issue was discussed in a meeting.

jandrieu commented 4 years ago

@dlongley I believe that statement is fundamentally incorrect.

That information is primarily about the DID subject.

The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble. It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document.

Statements about Subjects don't belong in DID Documents.

If we don't tow that line, we are inviting a privacy nightmare with this work.

dlongley commented 4 years ago

@jandrieu,

The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble.

Yes, this information is about the subject. That there are risks there are not a reason to break the model, IMO.

It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document.

I think it would be confusing to create a new model here (both mentally and technically) -- i.e, "public information about a subject is not about the subject, but private information is". The issue isn't with whether or not the information is about the subject. It's about public, discoverable information vs. private information. What we need to do is provide clear guidance on what should be said where. This is no different from talking about people in general and I suspect moving away from that will only create more confusion. I think it is better to draw on what people already know about public vs. private to help avoid trouble rather than try to obscure it away with a special model.

Statements about Subjects don't belong in DID Documents. If we don't tow that line, we are inviting a privacy nightmare with this work.

Privacy is always going to be a consideration no matter what we do. We have to be clear and upfront about what kind of information should be in a DID Document that is publicly available or on a blockchain, for example. And, yes, no private information should ever be there.

iherman commented 4 years ago

@jandrieu @dlongley chiming in again with my Semantic Web hat on; maybe this is one of those cases when the RDF terminology does help. (It does help me, but I am biased by my background.

If I look at the DID document, then I only see triples like

<did:example:123456789abcdefghi> authentication <did:example:123456789abcdefghi#key> .
<did:example:123456789abcdefghi#key> publicKeyPem "...."
etc.

I.e., strictly speaking, we are making statements about the DID (URI). The RDF Semantics doesn't require anything more about the DID URI and what it "denotes" (in our case about the relationship between the DID URI and the DID Subject). It says:

IRI meanings may also be determined by other constraints external to the RDF semantics; when we wish to refer to such an externally defined naming relationship, we will use the word identify and its cognates.

(Emphasis is mine).

In other words: the only thing the DID document contains are statements about the DID as a URI, and any relationship between the DID and the DID subject is defined "outside" of the DID document. You guys tell me exactly where.

Does this help?

jandrieu commented 4 years ago

@dlongley The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/

MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document.

What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28:

URI "resolution" is the process of determining an access mechanism and the appropriate parameters necessary to dereference a URI;

You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject".

Asserting the broader statement will lead to inappropriate information included in DID Documents rather than expressing them through other secure or verifiable mechanisms, like VCs. This would directly undermine the separation of concerns that underlies the entire framework of VCs and DIDs and the idea of decentralized identity as we--as a community--have been working on for years.

If we don't make the distinction about what goes in a DID Document clearly, early, and consistently, we will be enabling massive global tracking systems such as that proposed by GADI http://didalliance.org/.

jandrieu commented 4 years ago

@iherman I think you have the gist of it, with one clarification. The statements are not about the DID-URI, but rather about how you use the DID. The distinction between DID-URIs and DIDs is an unfortunate one, but the DID Document can't know the full DID-URI that might be ultimately dereferenced. All the statements are relative to the DID.

This makes for some delicate nuance between a DID-URI (whose ABNF is in the spec) and a DID as a URI, both of which might be referred to as a DID URI.

dlongley commented 4 years ago

@jandrieu,

The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/

In my view, this is in support of not drawing some artificial line at the data modeling layer between public and private. The data is about the subject -- the only question is about whether it is appropriate to express certain pieces of information in places where anyone can read them.

MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document.

I don't think the terms are meaningless -- though they can get sticky to pin down, violating expectations. I think we'll find a similar problem with other approaches, too, as I mention below.

What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28: You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject".

Yes, but you could say that "how you interact with the Subject did:123" is you "must call him by the name Joe Andrieu". Similarly, you could say "how you interact with Subject did:123" is you use endpoint "https://my-website.com/my-SSN/my-other-private-info/foo". Perhaps we'll end up debating the semantics of "secure" instead. Who knows? But I'm sure a nearly unbounded set of examples like this can be used to violate expectations here as well.

None of this changes (or should change) that we have a graph data model that expresses information about subjects. Again, this is a debate about what should be expressed and where. You may have argued that "private" and "public" are semantically meaningless, but they clearly get across some meaning, even in this conversation. I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve. I also don't think we should shy aware of terms that are more commonly understood; they get us closer to where we want to be and help establish the very expectations we worry may be violated.

Perhaps it would be simpler and better to talk about the information in a DID Document in terms of who can read the DID Document.

TallTed commented 4 years ago

"Subject" is causing trouble again, still, forever.

Also, a DID document may contain a representation of a graph -- but a DID document is not itself a graph!

We interact with entities (that may be humans, organizations, or otherwise).

Those entities may be identified by DIDs (but those entities are not DIDs). If identified by DIDs, those entities should be the subjects of DID documents which documents contain sentences describing those entities identified by the DIDs, and which documents might also contain sentences describing the documents themselves -- as they should in a Linked Data world -- and in such case, the documents should be identified with a different identifier than that which identifies the entity (the DID) which description is the purpose of the DID document.

jandrieu commented 4 years ago

@dlongley I'm not saying they are meaningless terms, I'm saying they aren't black & white. What is private in one context may not be in another. Privacy is innately contextual and the context in which a DID Document might be read is unknowable. In fact, ANY data might be considered private, depending on context. Therefore, private v public is an ineffective way to distinguish between what should be in a DID Document and what should not. There will absolutely be service endpoints that some would consider private, while others will bend over backwards to keep correlatable yet non-private pseudonyms out. It's up to the DID Controller whether or not to use service endpoints (or other data) that might be correlatable and thereby, in some context, be considered private. It's not up to us, in the specification to define, embed, and then police some abstract notion of what should be private and what should be public. That way lies madness.

@TallTed is right. In one lens, of course graphs are about subjects. That's how RDF works. I'm using Subject as the term is defined in VCs and in the spec: the entity referred to by the DID. It's unclear how you mean it.

If the defining nature of what should and should not go in a DID Document whether or not a statement is about a subject (RDF sense), then there is no meaningful distinction; ALL RDF statements are about a subject. Equally so, if the litmus test is whether or not the statement is about the Subject (in the VC and DID sense), that is equally meaningless AND invites putting inappropriate information in a DID Document.

If, instead, you build on the RFC3986 distinction about resolution, then the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject, including, IMO, the provenance of the DID Document itself, because it tells you why you should believe any of those statements are "secure".

That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test?

The examples you gave made my point more than yours. It's trivial (and yet potentially useful) to put information about secure interactions, which violates some notion of privacy. That's why private is a horrible litmus test. In contrast, any information you put in a DID that isn't about secure interactions with the subject absolutely should not go in the DID Document.

Back to the point of this issue...

For ALL DIDs, the only way to know you have the authentic DID Document is to exercise DID resolution according to the DID's method. As such, any supporting meta-data for why you should believe that resolution returned a correct DID Document is provenance that, IMO, should be included in the DID Document itself. Data without provenance is meaningless; therefore, we should embed the provenance WITH the data.

You said

I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve.

Could you unpack that? All I want it to solve is defining a litmus test of what should and should not go into a DID Document. The distinction I offer is actually a distinction. You're statements about subjects (or Subjects) provide no distinction whatsoever.

You also said

I also don't think we should shy aware of terms that are more commonly understood

"Privacy" is one of the least understood terms in this industry. Talk to anyone who has been working on the problem professionally for more than a freshman year and they will tell you that regulators, legislators, developers, end-users, and entrepreneurs constantly put forth different notions on what privacy means to them. To some it means to be left alone (Brandeis) to others it means agency (Gropper) to still others it means avoiding PII leaks. There is no commonly accepted definition of what is "private". For a hot minute Personally Identifiable Information (PII) was the red herring many thought would provide a functional way to manage privacy. Turned out that was a horrible way to try and discuss privacy, much less regulate it.

Public and private are not well defined terms. Period.

dlongley commented 4 years ago

@jandrieu,

... the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject...

That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test?

I think the problem is with this test -- I suspect just about anything can be construed to meet its demand. Any piece of information about the subject could be understood to be required to have a secure interaction with the subject, depending on the context. The subject's cat's name? Well, on catville.com, that's key. I think this test is actually less useful than thinking about who can read the contents of the DID Document.

jandrieu commented 4 years ago

Exactly. So the requirements for catville are different than those for others. But let's take your offer and talk about who can read the contents of a DID Document.

To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some?

Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification.

For all of the use cases currently in the DID Use Case document, it is presumed that DID Documents are accessible to anyone who has the DID and access to the mechanisms of resolution per its method. Notable exceptions in the community discussion are contextual DIDs such as did:git and did:peer, where if you aren't a part of the context, you can't resolve the DID.

I expect adding authorization isn't what you mean. Some notion of baking authorization to read a DID Document into the DID Document would be a significant departure from current conversations.

So, from a specifications standpoint, we should assume that ANYONE might read any given DID Document. Which is why ONLY that information directly relevant to secure interactions with the subject should be included.

Putting your favorite cat, a street address, or an email address into a DID Document is an anti-pattern, UNLESS it, in fact, contributes to secure interactions with the Subject. Not that it might--that would lead us to potentially putting the entire data warehouse worth of PII in--but that it specifically DOES. A service endpoint of http://twitter.com/JoeAndrieu IS completely reasonable if that is how the controller chooses to present a channel for secure interaction. Arbitrary statements like "The Subject is known to the State of California as Joseph Andrieu" are NOT.

In fact, that service endpoint MUST NOT be interpreted as saying the Subject is the person who controls http://twitter.com/JoeAndrieu, but rather simply that http://twitter.com/JoeAndrieu is a means to interact with the Subject. That interaction may be understood to be posting @JoeAndrieu publicly--which is, in fact, interpreted by others as sort of a digital drop of messages never even intended for Joe Andrieu.

Can you unpack the insights you think we'd get by asking who gets to read a DID Document?

dlongley commented 4 years ago

@jandrieu,

To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some?

Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification.

No, I'm not suggesting we propose any. I'm suggesting that we're using an open world data model and that what should govern whether or not something appears in a DID Document depends on a combination of the what the DID controller wants to put there and what the DID method allows. These, in turn, should be governed, at the very least, by an understanding of who is able to read the DID Document.

If anyone can read the DID Document -- then only put information in the DID Document that you're ok with anyone reading. I don't think it has to be more complicated than that in terms of data visibility.

Beyond this, all we're doing is saying in the spec is: if you're going to represent verification methods, controllers, services, etc. -- here's the interoperable way of doing that.

Side note: There are still discussions this group needs to have on GDPR-compliant "proxy/see also" services that can appear in DID Documents registered on blockchains. These services would direct people to more information about the DID subject, including additional service endpoints that may not be able to be written to the blockchain in a GDPR compliant way. This other graph of information could potentially require some authorization to get access to it ... which is one thing I was alluding to.

peacekeeper commented 4 years ago

I think I'm mostly with @dlongley in this thread. The RDF statements in the DID document are about the DID subject. The intention is that these statements contain only public information, and the primary motivation is that they will be used for secure interaction with the DID subject. I'm also supportive of the open world model, i.e. a DID document could contain arbitrary other statements, if the DID controller wants that and the DID method supports it. We had a long discussion about "hardening" (i.e. strongly constraining) DID documents about 2 years ago.

The DNS record analogy is partially useful when talking about resolution, but one difference is that a DID is an identifier for a real-world entity, whereas a domain name is not (an HTTP URI containing the domain name might be).

To get back to the original topic, if we want to make statements about the DID document itself, then as @TallTed has noted we would strictly speaking need a separate identifier, and we would therefore need to change the overall JSON-LD structure.

Example 1:

{
    "@context": "...",
    "type": "DidDocument",
    "created": "...",
    "updated": "...",
    "proof": [ ... ],
    "didSubject": {
        "id": "did:ex:1234",
        "authentication": [ ... ],
        "service": [ ... ]
    }
}

In this example, the identifier of the DID subject is did:ex:1234, and the DID document has a separate blank node identifier (it could also have its own IRI). There are a number of problems with this, such as the RFC 3986 rules for dereferencing DID URLs with fragments the way we've been using them (e.g. did:ex:1234#key-1).

Example 2:

{
    "@context": "...",
    "meta": {
        "id": "#meta",     // could be omitted to use a blank node identifier instead
        "created": "...",
        "updated": "...",
        "proof": [ ... ]
    }
    "id": "did:ex:1234",
    "authentication": [ ... ],
    "service": [ ... ]
}

Or similar, with several possible variations. I believe this has similar problems with regard to DID URL dereferencing as Example 1.

Or we just leave things the way they are (maybe preprending certain property names such as "docCreated" as suggested by @dmitrizagidulin). This means that would we accept a certain "conflation" (aka "simplification") of identifiers for the DID subject and the DID document.

I believe we have had this conflation for a long time anyway, due to the two assumptions that 1. the DID identifies the DID subject, and 2. we want to use DID URLs such as did:ex:1234#keys-1. I believe if we wanted to be super correct about RDF semantics and URI dereferencing rules, we would have to drop one of these two assumptions; the implications would be quite significant.

iherman commented 4 years ago

Looking at the first pattern of @peacekeeper, with a little additional JSON-LD trick it can be turned into a semantically perfectly sound structure. I have turned example 1 into finished JSON-LD with an additional statement in the context:

{
  "@context": [
    "https://www.w3.org/ns/did/v1",
    {
      "didSubject": "@graph"
    }
  ],
  "type": "DidDocument",
  "created": "2019-11-26",
  "didSubject": {
    "id": "did:ex:1234",
     "authentication": [
        "did:example:123456789abcdefghi#keys-1",
        {
           "id": "did:example:123456789abcdefghi#keys-2",
           "controller": "did:example:123456789abcdefghi",
           "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
        }
     ]
  }
}

Which translates in a set of TriG statements as follows:

_:b0 
     dcterms:created "2019-11-26"^^<xsd:dateTime> ;
     a <https://json-ld.org/playground/DidDocument> .

_:b0 {
    <did:ex:1234> did:authenticationMethod
         <did:example:123456789abcdefghi#keys-1> , 
         <did:example:123456789abcdefghi#keys-2> .
    <did:example:123456789abcdefghi#keys-2> 
        did:controller <did:example:123456789abcdefghi> ;
        did:publicKeyBase58 "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV" .   
}

(see JSON-LD Playground to experiment with this further.)

I have not looked at example 2 but, at first glance, that seems semantically a bit less clear.

msporny commented 4 years ago

Thanks to @peacekeeper for the examples, building on what he has said above.

Example 1 is sort of how we dealt with this topic with Verifiable Credentials. Example 2 is sort of how we dealt with this topic with the proof property.

Both are valid ways of expressing metadata about information, but here's the real issue:

We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource. It's information at that point in time... and that's all it is... and calling it a DID Document is confusing people.

There is information, and metadata about information.

Sometimes you serialize that information, and some people call that serialization "a document"... but it isn't. It isn't a unique parchment of which there is only one copy in the entire universe. It's this ephemeral thing, and sometimes you need to say things about that ephemeral thing.

We got this right with Verifiable Credentials. The outermost thing was metadata about the information (metadata about the credential), and the innermost thing was the information itself (the subject(s) of the credential).

I really worry about both Examples, I think they're both wrong.

Example 1 is wrong because it breaks all blockchain-based mechanisms. Submitting Example 1 to Veres One would mean that the DID subject would be setting the created and updated dates, and they have no right to do that. It's the consensus algorithm that decides when entries in the ledger are created and updated.

Example 2 is wrong for the same reason. The DID subject has no right to set the created/updated dates except in the fringe case where they actually control that information (like for did:web).

So, I think the correct solution is this (Example 3):

{
    "@context": "...",
    "type": "DidResolutionResponse",
    "created": "...", // when the DID Resolution was created
    "didCreated": "...", // when the DID identifier was created
    "didUpdated": "...", // when the DID Document was updated
    "didSubject": { // this is what we traditionally call the DID Document
        "id": "did:ex:1234",
        "authentication": [ ... ],
        "service": [ ... ]
    },
    "proof": [ ... ], // proof from the resolver
}

The proposal above (Example 3) is nuanced in its difference from Example 1. It works for did:web and did:v1/did:btcr/did:ethr where Example 1 is very problematic in the latter use cases. Here's how it could work: the did:web Method would state that any file written to a web server MUST be a DID Resolution response. This means that a resolver will hit a did:web method and pull a raw resolution response (that contains a didSubject) from the web server. If a developer just wants the "DID Document", they pull the didSubject field out and give it back to the developer. This creates the proper separation of concerns and doesn't require us to rearchitect what a DID Document is (and frankly, I don't think that would be the right thing to do at this stage anyway).

We do have the authority in the Working Group to specify a data model for "where metadata about the DID Document should go". The trick is doing this w/o opening a massive can of worms that is DID Resolution. So, we have a few options going forward:

  1. State that metadata about a DID Document is out of scope for the DID WG, and it should go in the DID Resolution spec. This one is easy and keeps our scope limited while providing an answer for the did:web folks.
  2. State that the data model for specifying metadata about the DID Document is in scope, but the resolution protocol is out of scope. This one is a slippery slope.
jonnycrunch commented 4 years ago

this puts a lot of power in the resolver.

jonnycrunch commented 4 years ago

also, just to highlight the self-sovereign cryptographic signature that I as the author of the DID data assert the time created and updated, not that it is necessarily convenient to be there.

jandrieu commented 4 years ago

@jonnycrunch IMO, you shouldn't trust a resolver you aren't running any more than a bitcoin node you aren't running, for the same reasons. DLT-based resolution generally requires a full node under the hood. The point of meta-data about DID Document resolution is for a given resolver to provide some level of assurance (mechanism TBD and per-method) for making a trust decision about that result.

The kind of meta-data we are talking about could include just about anything, including identifying information about the resolver, so that one could rely on specific resolvers (either pseudonymous with some notion of reputation or bound to legal entities and their reputations). Another kind of "meta-data" could include the block height of the tip (for BTCR) or even a merkle bloom filter that could be used elsewhere to proof existence on chain of the root of the DID Document. I'm just speculating about these cryptographic assurances, but they are definitely part of the "stack" for deciding whether or not to rely on the result from a given resolver.

msporny commented 4 years ago

The discussion on the WG call today was all over the place, and I think the root cause was because no one, including me, has defined what "created" means. At least six definitions popped up during the discussion today:

  1. The time that the DID subject is asserting they created the DID.
  2. The time that the resolver is asserting that the DID was created.
  3. The time that the ledger consensus algorithm is asserting that the DID was created.
  4. The time that the DID subject is asserting they created the DID Document.
  5. The time that the resolver is asserting that the DID Document was created.
  6. The time that the ledger consensus algorithm is asserting that the DID Document was created.

I think @dmitrizagidulin was talking about either 1 or 4, I was talking about 3 or 6, and I'm not sure which one @peacekeeper was talking about.

Let's go at this from the other direction and get very specific about the items being discussed. I don't think having the conversation in the abstract is helping us. Let's just focus on "created" and all make sure we're talking about the same definition before we start talking about where the data should be stored.

peacekeeper commented 4 years ago

We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource.

As an httpRange-14 nerd, I would say: What is that resource that the DID identifies? The DID subject, right? Well that's not an "information resource", therefore it has no representation that can be retrieved, has no media type, and there is no way to dereference fragments like did:ex:123#keys-1. From an RDF semantics perspective, we treat DIDs like identifiers for the DID subject, but from a URI dereferencing perspective, we treat DIDs like identifiers for the DID document.

I think this is the reason why originally we didn't really mind having properties like "created", "updated", "services", "authentication" side by side without distinction.

dmitrizagidulin commented 4 years ago

@msporny - The point I (and @peacekeeper) was trying to make is not that there are multiple definitions of 'created'. It's that there are multiple timestamps that need to be tracked. Which may include:

  1. The time that the DID subject is asserting they created the DID Document. (And if that particular DID method uses a proof section in DID documents, this assertion will be signed by the creator.)
  2. The time that the resolver has retrieved the document (currently tracked in resolverMetadata.retrieved property of the resolution result).
  3. The time that the registry (ledger or other mechanism) asserts that the DID was registered. (This is method-specific, and would go into the methodMetadata section of the resolution result.)

2 and 3 already have mechanisms in the (DID Resolution) data model. And what we're arguing is that item 1, the self-asserted creation date of the document, belongs in the DID document.

(We were not talking about the timestamp that the DID was created (as a separate entity from the DID Document), because it's not really possible to record or keep track of.)

dmitrizagidulin commented 4 years ago

@msporny I agree with you, btw, that the current property, created, is ambiguous, and should be changed to something like docCreated, to indicate which of the timestamps it refers to.

dmitrizagidulin commented 4 years ago

The other thing that's important to note is that although we want to provide mechanisms to track multiple timestamps, their relevance will differ for each individual DID method.

For example:

For Veres One non-cryptonym type DIDs (did:v1:uuid), a DID document is first created, and then (a nontrivial amount of time later) is registered with the ledger. So, for Veres One:

For did:key method, the situation is slightly different:

For did:web method:

Other methods may or may not want to have finer-grained notions of what "registered" meant -- perhaps they track the timestamp for when a DID Document was first submitted to a node separately from when the overall ledger comes to consensus. In which case, all of those method-specific fine grained timestamps will belong in methodMetadata section of the DID Resolution Result (if the method is able to track it).

peacekeeper commented 4 years ago

the did:web Method would state that any file written to a web server MUST be a DID Resolution response. This means that a resolver will hit a did:web method and pull a raw resolution response (that contains a didSubject) from the web server.

My initial reaction is that this feels problematic, since a DID resolution result is constructed on the fly once a DID is resolved, so it doesn't make much sense to author this in advance and store it on a server. But I think I understand your reasoning behind this...

So, we have a few options going forward:

  1. State that metadata about a DID Document is out of scope for the DID WG, and it should go in the DID Resolution spec. This one is easy and keeps our scope limited

I actually like this. As @dmitrizagidulin said, we've been envisioning a methodMetadata property in the DID Resolution spec, which could potentially contain this kind of data.

peacekeeper commented 4 years ago

Regarding the different types of timestamps, I think I have a much simpler approach to this:

All the other details (asserted and signed by subject, asserted by ledger, created but not yet registered, registered but not yet propagated, etc.) are all method-specific and should not bother us on the DID core spec level. All DID methods need to clearly specify how "create" and "update" work, and based on that the corresponding timestamps should be set in the DID document (or DID resolution result). And if it's not possible to get that timestamp data (e.g. in did:key), then that's fine too.

dmitrizagidulin commented 4 years ago

@peacekeeper I like that approach, nice and elegant.

However, I disagree in one important detail. The self-asserted creation date of the DID Document, signed by the subject, is not method dependent. It's a concept that is the same for all DID methods (though some don't sign their DID Docs, relying on the ledger to do it, and some do). I think that timestamp needs a first-class property to mark it.

dmitrizagidulin commented 4 years ago

That said, although I do think that the self-asserted docCreated timestamp is a concept that means the same thing across all DID methods, if the group consensus is that it's not useful for all methods, I'm content to push it out to each individual DID method spec.

iherman commented 4 years ago

This issue was discussed in a meeting.

msporny commented 4 years ago

@dmitrizagidulin and @peacekeeper -- YASS! Now we are getting somewhere...

The point I (and @peacekeeper) was trying to make is not that there are multiple definitions of 'created'. It's that there are multiple timestamps that need to be tracked.

That means there are multiple concepts, and each concept needs a definition. You have outlined at least 3 below:

The time that the DID subject is asserting they created the DID Document. (And if that particular DID method uses a proof section in DID documents, this assertion will be signed by the creator.)

Note that this definition doesn't match the definition of created in the spec right now, and as you said above, the current definition in the document is thoroughly ambiguous.

The time that the resolver has retrieved the document (currently tracked in resolverMetadata.retrieved property of the resolution result).

Ok, so this is out of scope if we're talking about resolver metadata... and we're agreeing not to conflate this value with the one above.

The time that the registry (ledger or other mechanism) asserts that the DID was registered. (This is method-specific, and would go into the methodMetadata section of the resolution result.)

This is also out of scope, since it also has to do with resolver metadata... and again, we're not going to conflate it with the other two concepts.

So, now things get much simpler... we're only talking about a concept whose definition is this:

unnamed created concept - The time that the DID subject is asserting they created the DID Document.

I do agree that that concept could belong in the DID Document and if we're going to define it, it should be the same for all DID Documents. I question whether anyone would want to depend on it (based on @SmithSamuelM and @selfissued's comments during the call), but that's easily discussed.

If that is what you and @peacekeeper are talking about, then the debate on 'created' and 'updated' is this:

unnamed created concept - The time that the DID subject is asserting they created the DID Document.

unnamed updated concept - The time that the DID subject is asserting they updated the DID Document.

... and we should get everyone on the same page that that's what we're debating. Let's stop talking about "metadata" in general, we'll eventually get to a design pattern on that. Let's just focus on these two items and then we can apply whatever lessons learned from discussing these things to other things that loosely fall into the category of metadata.

selfissued commented 4 years ago

An unnamed created concept is useless, as I see it. What would be useful to implementers would be concrete created claims, each with well-defined semantics. These examples may be all wrong, but I could imagine claims like whenDIDCreated, whenDIDRegistered, whenDIDResolved, etc. Those could all have clear, actionable meanings. That's the kind of direction we should go, rather than trying to over-abstract concrete things that can be simple.

TallTed commented 4 years ago

@selfissued - Yes, of course, "An unnamed created concept is useless".

I do not believe that anyone including @msporny is suggesting that we continue working on either the unnamed created concept or the unnamed updated concept as such, but rather that we (1) confirm whether these things which we can describe but have not yet named are the things we are discussing, (2) give them suitable names (one of which might be whenDIDCreated), and (3) proceed...


"Metadata" is a subset of "data". It's only "meta" because it's not the focus at the moment. "Metadata" about a Word .doc is not usually in focus, because it's the content of the .doc that's important -- until you need to arrange several very similar .doc files in order of revision -- and then that "metadata" becomes the (temporary) focus, and thus "data".

"Meta" is not a permanent condition; it is not a defining attribute.

This is why I suggested an illustrative sketch, starting with the kernel which we all agree is in focus (the DID Subject, identified by the DID); then identifying all the data that goes into the description of the DID Subject which is contained in the thing we've been calling the DID Document (i.e., the means of identifying and interacting with the DID Subject); then identifying all the data that describes the DID Document (which might include whether or not a given DID Document is entirely ephemeral, is concretized in some fashion, etc.) and/or describes the statements which are found in the DID Document, and so on.

Useful names for these descriptors may or may not be obvious or unanimously agreed upon -- but problematic names (whether ambiguous as with created, or in disagreement with what they're labeling, or otherwise) will usually quickly become obvious as such.

msporny commented 4 years ago

I do not believe that anyone including @msporny is suggesting that we continue working on either the unnamed created concept or the unnamed updated concept as such, but rather that we (1) confirm whether these things which we can describe but have not yet named are the things we are discussing, (2) give them suitable names (one of which might be whenDIDCreated), and (3) proceed...

Yes, exactly, what @TallTed said - let's get the definitions of the concepts we're discussing locked down... once we do that, naming becomes easy... and once we do that enough times, a design pattern will emerge.

iherman commented 4 years ago

This issue was discussed in a meeting.

peacekeeper commented 4 years ago

At the Amsterdam F2F meeting in January 2020, @gannan08 ran a session on this topic (see slides).

We then started a document to collect (meta-)data items related to DIDs and DID documents.

The next steps are:

burnburn commented 4 years ago

Chairs set a 2 week deadline on the document from today after which we can move to the next step.

jricher commented 4 years ago

See also #203

OR13 commented 4 years ago

DID Document metadata does not belong in the did document, it does belong in the response from a DID Method resolver.... in the same way that Content-Type is an HTTP Header, not a property of HTML or JSON responses.

The current Google Document makes no sense to me, it contains both properties of a did subject, and properties related to cryptographic construction of the did method... I consider properties related to the construction of the did method to be "did method/document metadata" and properties related to the did subject to be "properties of a did document".

I think we need to define DID Method Resolution in the Core Spec, as a process which converts:

did:example:123 to

{ didDocument: { id: "did:example:123", ... }, metadata: { contentType: "application/json+ld", ... }

After which point it will be possible to actually create a did method of "application/json" and know the difference without sniffing didDocument content.

... @kdenhartog I guess I agree with you now... :)

And yes, I get that the point of the document was to collect attributes, and then decide... consider this comment as me adding all attributes defined in the did-core json-ld context to the list, along with all existing defined mime types :)

peacekeeper commented 4 years ago

I propose the following next step, which I think is in line what we discussed at the F2F:

Now that we are collecting some items in that Google doc mentioned above, decide what the "buckets" or "categories" will be where (meta-)data will go. Remember that this discussion started because we had different understandings what the "created" property means. Some of the interpretations were:

So now we should try to identify and name the "buckets" or "categories" we want to accurately express everything. I would also recommend reviewing @gannan08 's excellent presentation from the F2F again.

For now, let's leave out the related topics of what the concrete data structures would look like, or how they would be returned by a DID resolver. Let's discuss that separately later.

peacekeeper commented 4 years ago

My personal proposal would be that we have 3 "buckets" which we could describe and name as follows:

  1. Data about the DID subject -> "DID document"
  1. Metadata about the DID and DID document -> "DID document metadata"
  1. Metadata about a DID resolution process -> "DID resolution metadata"

Again, I would propose to not discuss concrete data structures or resolver behavior yet. For example, it may be possible to express two of the above "buckets" as part of a single data structure or merge them, instead of inventing too many separate ones. Also, the format may not necessarily be JSON(-LD) for all of them, perhaps we'll have key/value pairs similar to HTTP headers, or perhaps we'll have multiple representations. But let's agree on the "buckets" first.

Thoughts?

msporny commented 4 years ago

let's agree on the "buckets" first.

I think what @peacekeeper is suggesting (as well as all of the bullet items) are an excellent next step. I agree with the buckets and all bullet points (but reserve the right to change my mind if complexities require us to fine tune the bullet points later). What Markus says above fits my mental model of the buckets.

jricher commented 4 years ago

There is a bright line between document and metadata, and it's that apart from the DID Document itself, I need to have a way to understand the DID Document.

What I see is that when you call a resolve() method, you're going to get back a stream of bytes representing the did document and something representing the metadata. This is especially true for the resolution metadata from Markus's list, but arguably for both (2) and (3) above. I would recommend that we borrow the "headers" concept from HTTP to represent this.

From an abstract standpoint, it's a hashmap of strings. The keys cannot be repeated, the values are always strings. A metadata definition can define an internal syntax on top of that if it wants to, for things like dates, but I think keeping these as simple as possible and not having them be a rich structure is a feature. We don't want there to be a ton of different things here, just what's needed.

And we can define a single way to serialize this structure in a way that's simple. I would even argue to re-use the HTTP header grammar if that makes sense.

As a bonus, this gives us a way to express input "options" to the DID resolver. We send in a bunch of request headers along with the DID.

To be perfectly clear: I am not saying we should use HTTP, nor that this would require HTTP to implement. I'm saying that other protocols like HTTP, SMTP, and many others have this same kind of separation between headers that are always in the same format and content that can be in a wide variety of formats. And these protocols have this pattern for a reason: it's simple, it's powerful, and it's functional.

nikosft commented 4 years ago

I find the list of buckets by @peacekeeper at https://github.com/w3c/did-core/issues/65#issuecomment-597030882 an excellent starting point. But I think @jricher comment https://github.com/w3c/did-core/issues/65#issuecomment-598416794 is at the wrong direction.

IMHO the correct paradigm to consider for DID documents is that of digital certificates. DCs can be retrieved and transferred using a number of protocols, they are "understood" by many systems and applications, they are portable, and they can even be transferred using out-of-band mechanisms. This happens because DCs are self-contained.

I wish the same property will hold for DID documents. I wish they can be easily ported from one registry to another, and I wish the amount of trust to registries will be minimum.

Having saying that, I believe bucket 2 should be part of the document, at least the metadata created by the controller. And for no reason the proof property section should be removed from the document!

OR13 commented 4 years ago

example:

{
  "@context": [
    "https://www.w3.org/ns/did/v1",
    {
      "@base": "did:web:did.actor:bob"
    }
  ],
  "id": "did:web:did.actor:bob",
  "publicKey": [
    {
      "id": "#z6MkkQBvgvqb6zGvS4cydworpUaRDzpszSFixq49ahbDeUTG",
      "type": "Ed25519VerificationKey2018",
      "controller": "",
      "publicKeyBase58": "6wvt6gb9mSnTKZnGxNr1yP2RQRZ2aZ1NGp9DkRdCjFft"
    }
  ]
}
peacekeeper commented 4 years ago

Current status of this issue: It should be addressed by the PRs related to the DID Resolution contracts that the WG is actively discussing right now: https://github.com/w3c/did-core/pulls?q=is%3Apr+label%3Acontract+

peacekeeper commented 4 years ago

The following current PRs address the topic of metadata structure and are therefore relevant to this issue: https://github.com/w3c/did-core/pull/298, https://github.com/w3c/did-core/pull/299, https://github.com/w3c/did-core/pull/300.

peacekeeper commented 3 years ago

I think our current understanding is that DID document metadata is returned separately from the DID document by the abstract resolve() function, see section DID Resolution:

resolve ( did, did-resolution-input-metadata )
     -> ( did-resolution-metadata, did-document, did-document-metadata ) 

Also, in section Metadata Structure, we are now defining data types for metadata, but we are not defining how it would be serialized or represented by implementations of the resolve() function.

With this understanding, to return to the original question in this issue, DID document metadata is logically NOT part of the DID document. BUT: Implementations of the resolve() function can still use various data formats that include BOTH a DID document AND its associated metadata. This thread has multiple examples what that could look like. The DID Resolution spec also has an example of a document structure called a "DID resolution result" that does this.

From my perspective, this is a good solution. Logically, metadata is not part of the DID document, but the abstract definition of the resolve() function and metadata still allows multiple approaches to document structures that can be defined elsewhere or be implementation-specific, etc.

Can we close this issue?

msporny commented 3 years ago

+1 on closing the issue because we have a concrete answer now (outlined by @peacekeeper above).

peacekeeper commented 3 years ago

@dmitrizagidulin based on the last few comments, can we close this issue?

brentzundel commented 3 years ago

No comments since marked pending close, closing.