Does DID Document metadata belong in the Document?

w3c / did-core

W3C Decentralized Identifier Specification v1.0

https://www.w3.org/TR/did-core/

Other

397 stars 94 forks source link

Does DID Document metadata belong in the Document? #65

Closed dmitrizagidulin closed 3 years ago

dmitrizagidulin commented 4 years ago

Does metadata about the DID Document (such as when it was created, updated, or who it was signed by) belong in that DID Document?

Note that this question is not about a) the metadata for the subject of the DID (keys, service endpoints) or b) the metadata about the resolution of a particular DID Document (proof added by a resolver, caching data, what servers/nodes were used for resolution) -- that belongs either in the Resolver metadata or Method metadata sections.

So far, there have been arguments both for and against placing this metadata in the DID Document itself (vs outside of it, say in the Resolver metadata sections).

A) This metadata is already in the registry

A - against: Since much of this metadata (specifically, the created and updated timestamps and the proof which includes authorship metadata and document integrity protection) will also likely reside in the underlying DID registry mechanism (distributed ledger, etc), a Resolver should be able to figure out this data from the registry, and include it in the resolution metadata.

A - for: In many (most?) cases, these are two separate sets of metadata - one about the document itself, and one about the underlying registry mechanism.

Also: The DID Document should be self-contained, in terms of critical metadata, in case it is archived or otherwise separated from its underlying ledger or storage medium.

B) Potential for developer confusion

B - against: If the DID Doc metadata (such as when the document was created) differs from the did registry metadata (when the document was registered on a ledger, for example), this may confuse developers.

B - for: @TallTed

You want to talk about "confused developers"? Check out "last accessed", "last modified", and "created", among other Unix-y timestamps attached to documents in Unix-y filesystems.

In other words, these two categories of metadata are separate, and developers constantly have to keep this difference in mind anyway.

C) Use cases

C - against: There are no use cases currently for this metadata. (Or, the use cases are unclear.)

C - for: There are use cases -- this topic is highly relevant to any DID registry using a mutable storage mechanism, such as the BTCR mutable extension documents or did:web method documents.

Also, as @peacekeeper points out:

Perhaps the strongest argument for a proof on a DID Document is to link DIDs to already existing PKI such as X.509 or the E.U.'s eIDAS infrastructure. You could include an eIDAS signature (this is called "eSignature" or "eSeal") on a DID Document to link the DID to a legal identity.

D) Offload this topic to DID method specific specs

D - against: Even if this metadata does belong in the DID Document, perhaps we should hand this off to each DID method to decide (rather than the main DID spec).

D - for: @ChristopherA

However, if there are any other DID methods that use mutable storage for DID Documents, they would need to solve the same problem we do, and they might do it different ways which could be good (for innovation) or bad (for security if they don't understand it well as our scenario is complicated).

In other words, this is going to be a common enough problem that we should address this in the main spec.

E) Conceptual elegance

E - against: @dlongley:

I want to point out that the way that we've avoided the HTTP-Range-14 argument (which we should absolutely continue to do) is by deciding that you can, for most practical purposes, conflate a DID Document and a DID subject (they have the same identifier). There's a danger that we may lose this simplicity by encouraging expressing information in a way that stretches the limits of that conflation.

E - for: ... an excellent point. Perhaps we can continue to benefit from this conceptual simplicity (of having the DID Doc be mostly about the DID subject) by making it clear via the attribute names that the metadata is about the doc, not the subject? Like, having the field be named docCreated instead of just created, to prevent ambiguity?

dlongley commented 4 years ago

A DID Document is a graph of information. That information is primarily about the DID subject. If we want to make statements about the graph itself, those statements do not belong in that very graph. There may perhaps be exceptions we can make for things like proof that get special treatment, but we should otherwise avoid this. If the statement we want to make can be reasonably understood to apply to the DID subject then we can put it in the graph. This gives us wiggle room to avoid the http-range-14 problem.

iherman commented 4 years ago

This issue was discussed in a meeting.

No actions or resolutions
View the transcript
Brent Zundel: https://github.com/w3c/did-core/issues/65
Markus Sabadello: this issue is about the question of having data in the DID document some of which is about the subject and some of which is about the DID document itself
… there were ideas to maybe remove some properties like created or updated
… or proof property. That’s where this discussion started
… we thought created, updated, proof is it about the subject or the DID document itself
… dmitriz wrote a really good summary
… The question that we need to agree on is are we okay with data that is sometimes about the subject and sometimes about the document
… or do we want to separate that somehow
… I think there’s a third category which is data about a the resolution process, some metadata may be added about that in the result
… but the primary question is are we fine having data about the did subject like services an dpublic keys
… as well as about the document like proof
… and if we’re not, do the subject and the document have separate identifiers?
… we spent a lot of time discussing that
… we felt we are comfortable with combining that, they don’t need separate identifiers
… the same identifier for the subject and the did document
Manu Sporny: https://github.com/w3c/did-core/pull/27
Manu Sporny: https://github.com/w3c/did-core/pull/28
Manu Sporny: I want to point out that we have two PRs pending, 27 and 28, when I put those PRs in there my assumption was that the created and updated were being used to describe metadata about the DID document
… but after putting it in I can see how people thought they were about the DID itself, the identifier
… this is really a metadata discussion, if created and updated are truly about the identifier itself and not metadata about the DID document then it’s fine to keep them in there
… but if is metadata about the DID document I feel strongly we should take it out, we shouldn’t be conflating those two things
… we need to decide whether or not it’s okay to use the same identifier to kind of sort of refer to two different things
Ivan Herman: +1 to manu
Manu Sporny: that is a huge red flag in the linked data space, your semantics get really messy
… similarly you do not need to have an identifier for everything
… you can do autogenerated identifiers, that’s a common thing, we use it in VCs
… we could have metadata about the DID document that’s outside of the DID document itself, much cleaner separation
Daniel Burnett: The DID document is not the resource. It is an explicit representation of access mechanisms (to use the HTTP URI analogy)
Manu Sporny: if we come to that philosophy it’s much easier for us to determine if a particular item is in or outside of the DID document
… I thought the original issue was about metadata about the DID document, interested to see if anyone hears differently
Jonathan Holt: I thought these were for convenience, and if you wanted to find the original source of truth you spin up a resolver or your own node and verify the assertions being made in the DID document
Manu Sporny: I’m hearing Jonathan say “issued” and “created” are about the DID Document.
Jonathan Holt: my interpretation was they were self asserted related to creation of the DID document, and are there for convenience
… what markus mentioned for identifiers, the keys ed25519, hiding keys.. was that what you were talking about as far as the subjec tidentifier? you have to have a self asserted key identifier in the DID document that’s only about its own keys?
… or are we having this conceptual framework of referring to delegate keys or controller keys?
… what are the semantics we are working with?
Markus Sabadello: we’re not talking about identifiers for keys, we’re talking about whether the DID is an identifier for the subject, that’s where we ended up after a few months of httpRange-14, or is the DID the identifier for the document, or is it both?
… I think the community thinks it should be both
… but understand that’s ambiguous from a linked data perspective
Jonathan Holt: the subject is the identifier around the DID document, not a human subject?
Markus Sabadello: the DID subject is the person, org, thing, whatever, resource, identified by the DID
Daniel Burnett: “The DID subject is the subject of the DID.” <- Official definition :)
Dmitri Zagidulin: interpretation about created, updated, my summary in issue 65, I was interpreting them to be metadata about the DID document
… I’m not sure it makes sense to have metadata abut the DID because it doesn’t apply separately to the DID document
Markus Sabadello: +1 to dmitriz that created, updated are metadata about the DID document
Dmitri Zagidulin: On the issue of does metadata about the document belong there. On which grounds is manu objecting?
… I laid out several arguments that i’ve seen you make in the various issues against it, which are relevant and howd o you feel about the counterpoints?
Joe Andrieu: I’ve flipflopped on this issue
… one aha for me right now is the definitive way to find out if a given DID document is the correct DID document for a given DID is to execute the resolution process
Daniel Burnett: markus_sabadello, does created not apply to BTCR DIDs where DID documents are generated rather than stored?
Joe Andrieu: If that’s correct, there’s not necessarily a baked in way for a document to demonstrate on its own as a set of bytes that it’s the authoritative one, I think whatever metadata you need to verify the process needs to be in the DID document
… otherwise that separation feels a little false to me
Manu Sporny: there is a certain subset of things I’m strongly objecting to, and that’s the conflation of any kind of semantics
… it’s not clear to me what the group things issued, attributed, means yet
… one thing that might be helpful, there are two categories of information we are talking about
… information about the identifier itself, the DID string, and whatever it may identify
… and then information about the DID document itself
… those are two distinct categories that i think we should keep distinct
… if we conflate them there’s nasty stuff that can happen
… that’s where my concern comes from
… Let’s say that we say that updated is the time the identifier was updated. Semantically that’s meaningless. I know the identifier was updated but it doesn’t tell me anything more than that
… whereas if the DID document was updated, there’s a change the resolver can check, that’s about the document itself not the identifier. The semantics are very different
Dmitri Zagidulin: nobody is proposing that it would be about the identifier
Manu Sporny: i’m not convinced
… I think some people are and some people aren’t, and some people don’t understand what conflating those two things does to the entire data model
… You may not be proposing that and I think other people might be, we need to get down to the definition of what created and updated really means to people, and then see if those definitions are the problem
Dmitri Zagidulin: the topic of this issue is does metadata about the document belong in the document
… that’s a separate httpRange-14 discussion
… nobody is conflating, just discussing whether data about the document belongs in the document
Ted Thibodeau Jr: “How do we identify the identifier which identifies an entity?”
Dmitri Zagidulin: Having metadata about the DID document in the document allows portability
… it allows fo standardizing of that metadata among mutable DID methods that don’t have underlying ledger mechanisms
Markus Sabadello: manu is saying sometimes we’re talking about metadata about the identifier, I don’t think that makes much sense
… it always identifies something, and the data is about that resource
… we can’t have data about the identifier, we can only have data about the thing being identified
… with data about the subject, data about the DID document
… I agree it’s better to separate them, even though conflating was the outcome of a few months of discussion of httpRange-14, makes more sense to keep separate
… agree with dmitriz that the metadata about the document inside the DID document is the issue. inside the DID document we need a separate object or level of JSON-LD structure
… one one level describe the document, on one about the subject
Dave Longley: when we’re talking about updating or applying an update to a DID document, eg. adding a key, we’re really updating the subject
Daniel Burnett: yes, explicitly marking any meta data as such by placing it in a separate subtree in the DID doc would at least make clear that it is different
Dave Longley: The predicates in a DID document are things like authorization, which the subject, some aspect of a person or some thing, and when you add a key you say this person authorizes this key for some purpose
… that’s the statement you’re making
… if you make that kind of update you’re updating information about the subject, not the document
… these update times that are metadata might actually be information about the subject not the DID document
Manu Sporny: I agree with dlongley, that’s the point I’m attempting to make.
Dave Longley: dmitri also brought up portability, we’re talking about porting information about the subject, not the document
… the information inside the DID document is about the DID subject. That’s what you’d want to port
… I think we disagree less than we think because a lot of these things we’re talking about are really just more information about the DID subject
… manu was talkign about the identifier, I think he really meant information about the subject not the DID, we’re not changing DIDs, that doesn’t make any sense
… a lot of the disagreements go away because we’re not talking about metadata that happens to live on some registry somewhere, we’re talking about the subject
Joe Andrieu: manu, you conflated the identifier with the subject. A lot of people have been responding in confusing because of that. I don’t think anyone is talking about putting information about the subject in a DID, that would be a privacy antipattern
… we have a did that’s a string, we don’t need metadata about that
… The subject.. the DID document is how you get from the DID to secure interaction with that subject
… We need to be much more careful about the language we use here, it’s confusing us, going to be more confusing for others
… we have this weird issue of the definitive DID document is not a string of bytes anywhere, it’s the output of a resolution process
… to understand if it’s definitive, whatever metadata we use, needs to be part of the DID document
Daniel Burnett: I wanted to bump up a level here
… the metal model that led to where we are
… as long as we can keep that mental model we’ll be fine
… what joe said matches what manu said
… we wanted our use of DIDs as URIs to work similarly to the way other URIs work
… such as http URIs
… if you look at the definition that we always refer to of a URI there is a resolution process and a dereferencing process
… the resolution process is where you discover what the access method and operation methods with the resource are, including any kinds of authn approaches that are necessary
… we’re different from http - we put a lot of that information that is part of the resolution process in the DID document
… we’re getting confused by making the DID document something more magical than it is intended to be
… which is a representation about how you access and update the resource
… it’s not the access to the resource itself , it is the things you can do with the resource and how you can authenticate yourself for that
… That may help. We may still decide that there is information that is not about the resource itself but we stil may put it inside the DID document
… joe is correct that conceptually the resource access methods all of this exists even for DID methods that do not explicitly store a representation of the DID document
… the DID document can be generate if necessary, not have to live at a location somewhere
Ivan Herman: from a linked data / semantic web point of view
… with JSON-LD for the did document, that means we define in a particular syntax a bunch of RDF triples and if I can imagine a linked data environment which includes lots of triples, includes the triples in the did document
… according to the JSON-LD and RDF, there are triples, and all what I see in the DID document. The triple consists of subject, predicate, object, and the subject is a DID URL
… that’s what happens in RDF
Manu Sporny: yep, to Ivan.
Ivan Herman: none of those triples have to say anything about the DID document itself because the DID document is just a collection of triples
Manu Sporny: exactly, Ivan.
Ivan Herman: if we want to say something about the DID document, we need another subject that identifies it, in order to play properly with the linked data world
… if you link it to any other process that wants to use these identifiers, we have to be careful because you will get wrong triples
… triples that say things you don’t want
… and someone may use those triples to deduce things that semweb technologies can deduce, you will get wrong statements, you cannot mix these two up
Brent Zundel: we have 9 mins left
Dmitri Zagidulin: to draw a parallel with the VC data model
… we had the same discussion about the created metadata, and there we have two separate sections, subgraphs
… one about the credential and the other about the credential subject
… we label it
… we standardized the created timestamp for the verifiable credential
Dave Longley: +1 to ivan, the DID Document is a graph/dataset with triples about the DID subject in it
Dmitri Zagidulin: this is the same thing that’s being proposed for the DID document
… we standardize it for the DID document, not to the person or org
… if we need to have a separate linked data section so that the triples don’t get confused, that’s fine, let’s talk about that
Ivan Herman: +1 to dmitriz
Dmitri Zagidulin: but I want to re-emphasize the need for storing the data about the document not the subject in the document itself
Joe Andrieu: the conversation isn’t about triples. it’s about quads. about statements about statements.
Dmitri Zagidulin: the counter that manu seems to be proposing is we let each did method standardize their own. That doesn’t seem right
Manu Sporny: dmitriz that is absolutely not what I’m suggesting
… I think there’s some miscommunication
… we need some concrete examples
Dmitri Zagidulin: let’s take ‘created’ as a concrete example.
Manu Sporny: The thing that you raised is spot on - in VC we had two subgraphs, one for the credential, the other for the credential subject
… this is the exact same thing
… the issue is that.. what we need is put some concrete examples and ways we could address this problem
… we can use created and issued as examples
… that would help people see how the philosophy applies to an actual concrete solution
… we only need two examples, there are two ways we can go
… that’s what we need for the next time we discuss this
Ivan Herman: +1 to manu, we need specific examples
Manu Sporny: people can see what’s being proposed
Joe Andrieu: +1 for specific examples
… The thing is we’re not talking about triples, we’re talking about quads
Kenneth Ebert: I like the examples, too
Joe Andrieu: I’m not familiar enough with JSON-LD spaghetti, methods for representing quads
Brent Zundel: +1 for examples
Joe Andrieu: we’re talking about the context in which the triples are stated
Dave Longley: {id: $MD_CODE$$MD_CODE$ means "the DID subject (identified by `, authentication: [`) has authorized `]}` for the purpose of authentication" ... that's a statement about the DID subject
Joe Andrieu: we need to make statements about that context
… we need to be able to in the DID document say something about the DID document
… metadata about the resolution is part of proof
… why do we believe this? here’s some metadata about the process to increase your confidence that this is legitimate
… What needs to be in there we should figure out at the DID document level, not at the DID resolution level
Markus Sabadello: +1 to keep the triples/quads clean and separate. Strictly speaking we would need a separate identifier for the DID document
Daniel Burnett: +1 dlongley
Manu Sporny: you don’t need to give the DID Document a separate identifier… can be a blank node… works just fine.
Markus Sabadello: the problem with that which we’ve discussed before for a few months, if we give the DID document a separate identifier we ran into problems defining the dereferencing process with URLs, especially if the DID URL has a fragment
… the way you dereference a fragment is you first deref the primary resource, without the fragment. The result has a mime type and dereferencing the fragment depends on the mime type
Ivan Herman: +1 to markus_sabadello
Markus Sabadello: if it’s an identifier for the subject, we can’t dereference it because it’s a real world resource and doesn’t have a mime type
… I like what dmitri said, parallel with VC, separate sections about the document and the subject
Dave Longley: a DID Document itself is much more ephemeral – you generally don’t “talk about it”, except perhaps to make statements in a resolution process
Brent Zundel: we had a recommendation to present real world examples so we can have something more concrete to discuss about
… The issue, 65 is assigned to markus
Manu Sporny: {resolution_things… didDocument: {did document things}}
Brent Zundel: markus, comfortable working to arrange some concrete examples?
Markus Sabadello: I can come up with some examples
Manu Sporny: {metadata_about_did_document… didDocument: {did_document_stuff}}
Daniel Burnett: yes, dlongley, this is what I meant by giving a DID document more reality than it should have, which is a physical representation of resolution info
Dave Longley: I think it helps to think of the DID Document as a graph … for which we generally don’t give an identifier
Ted Thibodeau Jr: DID document … is { .ttl owl:sameAs .jsonld owl:sameAs .rdfxml }? Can you speak of one serialization? Or only of all?
Ted Thibodeau Jr: It can be important to track when info about a subject was changed, as well as when the subject changed, as well as when the info about the subject was logged (which may be different from when it changes)…
Ted Thibodeau Jr: VERY complex!

jandrieu commented 4 years ago

@dlongley I believe that statement is fundamentally incorrect.

That information is primarily about the DID subject.

The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble. It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document.

Statements about Subjects don't belong in DID Documents.

If we don't tow that line, we are inviting a privacy nightmare with this work.

dlongley commented 4 years ago

@jandrieu,

The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble.

Yes, this information is about the subject. That there are risks there are not a reason to break the model, IMO.

It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document.

I think it would be confusing to create a new model here (both mentally and technically) -- i.e, "public information about a subject is not about the subject, but private information is". The issue isn't with whether or not the information is about the subject. It's about public, discoverable information vs. private information. What we need to do is provide clear guidance on what should be said where. This is no different from talking about people in general and I suspect moving away from that will only create more confusion. I think it is better to draw on what people already know about public vs. private to help avoid trouble rather than try to obscure it away with a special model.

Statements about Subjects don't belong in DID Documents. If we don't tow that line, we are inviting a privacy nightmare with this work.

Privacy is always going to be a consideration no matter what we do. We have to be clear and upfront about what kind of information should be in a DID Document that is publicly available or on a blockchain, for example. And, yes, no private information should ever be there.

iherman commented 4 years ago

@jandrieu @dlongley chiming in again with my Semantic Web hat on; maybe this is one of those cases when the RDF terminology does help. (It does help me, but I am biased by my background.

If I look at the DID document, then I only see triples like

<did:example:123456789abcdefghi> authentication <did:example:123456789abcdefghi#key> .
<did:example:123456789abcdefghi#key> publicKeyPem "...."
etc.

I.e., strictly speaking, we are making statements about the DID (URI). The RDF Semantics doesn't require anything more about the DID URI and what it "denotes" (in our case about the relationship between the DID URI and the DID Subject). It says:

IRI meanings may also be determined by other constraints external to the RDF semantics; when we wish to refer to such an externally defined naming relationship, we will use the word identify and its cognates.

(Emphasis is mine).

In other words: the only thing the DID document contains are statements about the DID as a URI, and any relationship between the DID and the DID subject is defined "outside" of the DID document. You guys tell me exactly where.

Does this help?

jandrieu commented 4 years ago

@dlongley The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/

MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document.

What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28:

URI "resolution" is the process of determining an access mechanism and the appropriate parameters necessary to dereference a URI;

You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject".

Asserting the broader statement will lead to inappropriate information included in DID Documents rather than expressing them through other secure or verifiable mechanisms, like VCs. This would directly undermine the separation of concerns that underlies the entire framework of VCs and DIDs and the idea of decentralized identity as we--as a community--have been working on for years.

If we don't make the distinction about what goes in a DID Document clearly, early, and consistently, we will be enabling massive global tracking systems such as that proposed by GADI http://didalliance.org/.

jandrieu commented 4 years ago

@iherman I think you have the gist of it, with one clarification. The statements are not about the DID-URI, but rather about how you use the DID. The distinction between DID-URIs and DIDs is an unfortunate one, but the DID Document can't know the full DID-URI that might be ultimately dereferenced. All the statements are relative to the DID.

This makes for some delicate nuance between a DID-URI (whose ABNF is in the spec) and a DID as a URI, both of which might be referred to as a DID URI.

dlongley commented 4 years ago

@jandrieu,

The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/

In my view, this is in support of not drawing some artificial line at the data modeling layer between public and private. The data is about the subject -- the only question is about whether it is appropriate to express certain pieces of information in places where anyone can read them.

MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document.

I don't think the terms are meaningless -- though they can get sticky to pin down, violating expectations. I think we'll find a similar problem with other approaches, too, as I mention below.

What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28: You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject".

Yes, but you could say that "how you interact with the Subject did:123" is you "must call him by the name Joe Andrieu". Similarly, you could say "how you interact with Subject did:123" is you use endpoint "https://my-website.com/my-SSN/my-other-private-info/foo". Perhaps we'll end up debating the semantics of "secure" instead. Who knows? But I'm sure a nearly unbounded set of examples like this can be used to violate expectations here as well.

None of this changes (or should change) that we have a graph data model that expresses information about subjects. Again, this is a debate about what should be expressed and where. You may have argued that "private" and "public" are semantically meaningless, but they clearly get across some meaning, even in this conversation. I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve. I also don't think we should shy aware of terms that are more commonly understood; they get us closer to where we want to be and help establish the very expectations we worry may be violated.

Perhaps it would be simpler and better to talk about the information in a DID Document in terms of who can read the DID Document.

TallTed commented 4 years ago

"Subject" is causing trouble again, still, forever.

Also, a DID document may contain a representation of a graph -- but a DID document is not itself a graph!

We interact with entities (that may be humans, organizations, or otherwise).

Those entities may be identified by DIDs (but those entities are not DIDs). If identified by DIDs, those entities should be the subjects of DID documents which documents contain sentences describing those entities identified by the DIDs, and which documents might also contain sentences describing the documents themselves -- as they should in a Linked Data world -- and in such case, the documents should be identified with a different identifier than that which identifies the entity (the DID) which description is the purpose of the DID document.

jandrieu commented 4 years ago

@dlongley I'm not saying they are meaningless terms, I'm saying they aren't black & white. What is private in one context may not be in another. Privacy is innately contextual and the context in which a DID Document might be read is unknowable. In fact, ANY data might be considered private, depending on context. Therefore, private v public is an ineffective way to distinguish between what should be in a DID Document and what should not. There will absolutely be service endpoints that some would consider private, while others will bend over backwards to keep correlatable yet non-private pseudonyms out. It's up to the DID Controller whether or not to use service endpoints (or other data) that might be correlatable and thereby, in some context, be considered private. It's not up to us, in the specification to define, embed, and then police some abstract notion of what should be private and what should be public. That way lies madness.

@TallTed is right. In one lens, of course graphs are about subjects. That's how RDF works. I'm using Subject as the term is defined in VCs and in the spec: the entity referred to by the DID. It's unclear how you mean it.

If the defining nature of what should and should not go in a DID Document whether or not a statement is about a subject (RDF sense), then there is no meaningful distinction; ALL RDF statements are about a subject. Equally so, if the litmus test is whether or not the statement is about the Subject (in the VC and DID sense), that is equally meaningless AND invites putting inappropriate information in a DID Document.

If, instead, you build on the RFC3986 distinction about resolution, then the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject, including, IMO, the provenance of the DID Document itself, because it tells you why you should believe any of those statements are "secure".

That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test?

The examples you gave made my point more than yours. It's trivial (and yet potentially useful) to put information about secure interactions, which violates some notion of privacy. That's why private is a horrible litmus test. In contrast, any information you put in a DID that isn't about secure interactions with the subject absolutely should not go in the DID Document.

Back to the point of this issue...

For ALL DIDs, the only way to know you have the authentic DID Document is to exercise DID resolution according to the DID's method. As such, any supporting meta-data for why you should believe that resolution returned a correct DID Document is provenance that, IMO, should be included in the DID Document itself. Data without provenance is meaningless; therefore, we should embed the provenance WITH the data.

You said

I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve.

Could you unpack that? All I want it to solve is defining a litmus test of what should and should not go into a DID Document. The distinction I offer is actually a distinction. You're statements about subjects (or Subjects) provide no distinction whatsoever.

You also said

I also don't think we should shy aware of terms that are more commonly understood

"Privacy" is one of the least understood terms in this industry. Talk to anyone who has been working on the problem professionally for more than a freshman year and they will tell you that regulators, legislators, developers, end-users, and entrepreneurs constantly put forth different notions on what privacy means to them. To some it means to be left alone (Brandeis) to others it means agency (Gropper) to still others it means avoiding PII leaks. There is no commonly accepted definition of what is "private". For a hot minute Personally Identifiable Information (PII) was the red herring many thought would provide a functional way to manage privacy. Turned out that was a horrible way to try and discuss privacy, much less regulate it.

Public and private are not well defined terms. Period.

dlongley commented 4 years ago

@jandrieu,

... the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject...

That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test?

I think the problem is with this test -- I suspect just about anything can be construed to meet its demand. Any piece of information about the subject could be understood to be required to have a secure interaction with the subject, depending on the context. The subject's cat's name? Well, on catville.com, that's key. I think this test is actually less useful than thinking about who can read the contents of the DID Document.

jandrieu commented 4 years ago

Exactly. So the requirements for catville are different than those for others. But let's take your offer and talk about who can read the contents of a DID Document.

To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some?

Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification.

For all of the use cases currently in the DID Use Case document, it is presumed that DID Documents are accessible to anyone who has the DID and access to the mechanisms of resolution per its method. Notable exceptions in the community discussion are contextual DIDs such as did:git and did:peer, where if you aren't a part of the context, you can't resolve the DID.

I expect adding authorization isn't what you mean. Some notion of baking authorization to read a DID Document into the DID Document would be a significant departure from current conversations.

So, from a specifications standpoint, we should assume that ANYONE might read any given DID Document. Which is why ONLY that information directly relevant to secure interactions with the subject should be included.

Putting your favorite cat, a street address, or an email address into a DID Document is an anti-pattern, UNLESS it, in fact, contributes to secure interactions with the Subject. Not that it might--that would lead us to potentially putting the entire data warehouse worth of PII in--but that it specifically DOES. A service endpoint of http://twitter.com/JoeAndrieu IS completely reasonable if that is how the controller chooses to present a channel for secure interaction. Arbitrary statements like "The Subject is known to the State of California as Joseph Andrieu" are NOT.

In fact, that service endpoint MUST NOT be interpreted as saying the Subject is the person who controls http://twitter.com/JoeAndrieu, but rather simply that http://twitter.com/JoeAndrieu is a means to interact with the Subject. That interaction may be understood to be posting @JoeAndrieu publicly--which is, in fact, interpreted by others as sort of a digital drop of messages never even intended for Joe Andrieu.

Can you unpack the insights you think we'd get by asking who gets to read a DID Document?

dlongley commented 4 years ago

@jandrieu,

To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some?

Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification.

No, I'm not suggesting we propose any. I'm suggesting that we're using an open world data model and that what should govern whether or not something appears in a DID Document depends on a combination of the what the DID controller wants to put there and what the DID method allows. These, in turn, should be governed, at the very least, by an understanding of who is able to read the DID Document.

If anyone can read the DID Document -- then only put information in the DID Document that you're ok with anyone reading. I don't think it has to be more complicated than that in terms of data visibility.

Beyond this, all we're doing is saying in the spec is: if you're going to represent verification methods, controllers, services, etc. -- here's the interoperable way of doing that.

Side note: There are still discussions this group needs to have on GDPR-compliant "proxy/see also" services that can appear in DID Documents registered on blockchains. These services would direct people to more information about the DID subject, including additional service endpoints that may not be able to be written to the blockchain in a GDPR compliant way. This other graph of information could potentially require some authorization to get access to it ... which is one thing I was alluding to.

peacekeeper commented 4 years ago

I think I'm mostly with @dlongley in this thread. The RDF statements in the DID document are about the DID subject. The intention is that these statements contain only public information, and the primary motivation is that they will be used for secure interaction with the DID subject. I'm also supportive of the open world model, i.e. a DID document could contain arbitrary other statements, if the DID controller wants that and the DID method supports it. We had a long discussion about "hardening" (i.e. strongly constraining) DID documents about 2 years ago.

The DNS record analogy is partially useful when talking about resolution, but one difference is that a DID is an identifier for a real-world entity, whereas a domain name is not (an HTTP URI containing the domain name might be).

To get back to the original topic, if we want to make statements about the DID document itself, then as @TallTed has noted we would strictly speaking need a separate identifier, and we would therefore need to change the overall JSON-LD structure.

Example 1:

{
    "@context": "...",
    "type": "DidDocument",
    "created": "...",
    "updated": "...",
    "proof": [ ... ],
    "didSubject": {
        "id": "did:ex:1234",
        "authentication": [ ... ],
        "service": [ ... ]
    }
}

In this example, the identifier of the DID subject is did:ex:1234, and the DID document has a separate blank node identifier (it could also have its own IRI). There are a number of problems with this, such as the RFC 3986 rules for dereferencing DID URLs with fragments the way we've been using them (e.g. did:ex:1234#key-1).

Example 2:

{
    "@context": "...",
    "meta": {
        "id": "#meta",     // could be omitted to use a blank node identifier instead
        "created": "...",
        "updated": "...",
        "proof": [ ... ]
    }
    "id": "did:ex:1234",
    "authentication": [ ... ],
    "service": [ ... ]
}

Or similar, with several possible variations. I believe this has similar problems with regard to DID URL dereferencing as Example 1.

Or we just leave things the way they are (maybe preprending certain property names such as "docCreated" as suggested by @dmitrizagidulin). This means that would we accept a certain "conflation" (aka "simplification") of identifiers for the DID subject and the DID document.

I believe we have had this conflation for a long time anyway, due to the two assumptions that 1. the DID identifies the DID subject, and 2. we want to use DID URLs such as did:ex:1234#keys-1. I believe if we wanted to be super correct about RDF semantics and URI dereferencing rules, we would have to drop one of these two assumptions; the implications would be quite significant.

iherman commented 4 years ago

Looking at the first pattern of @peacekeeper, with a little additional JSON-LD trick it can be turned into a semantically perfectly sound structure. I have turned example 1 into finished JSON-LD with an additional statement in the context:

{
  "@context": [
    "https://www.w3.org/ns/did/v1",
    {
      "didSubject": "@graph"
    }
  ],
  "type": "DidDocument",
  "created": "2019-11-26",
  "didSubject": {
    "id": "did:ex:1234",
     "authentication": [
        "did:example:123456789abcdefghi#keys-1",
        {
           "id": "did:example:123456789abcdefghi#keys-2",
           "controller": "did:example:123456789abcdefghi",
           "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
        }
     ]
  }
}

Which translates in a set of TriG statements as follows:

_:b0 
     dcterms:created "2019-11-26"^^<xsd:dateTime> ;
     a <https://json-ld.org/playground/DidDocument> .

_:b0 {
    <did:ex:1234> did:authenticationMethod
         <did:example:123456789abcdefghi#keys-1> , 
         <did:example:123456789abcdefghi#keys-2> .
    <did:example:123456789abcdefghi#keys-2> 
        did:controller <did:example:123456789abcdefghi> ;
        did:publicKeyBase58 "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV" .   
}

(see JSON-LD Playground to experiment with this further.)

I have not looked at example 2 but, at first glance, that seems semantically a bit less clear.

msporny commented 4 years ago

Thanks to @peacekeeper for the examples, building on what he has said above.

Example 1 is sort of how we dealt with this topic with Verifiable Credentials. Example 2 is sort of how we dealt with this topic with the proof property.

Both are valid ways of expressing metadata about information, but here's the real issue:

We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource. It's information at that point in time... and that's all it is... and calling it a DID Document is confusing people.

There is information, and metadata about information.

Sometimes you serialize that information, and some people call that serialization "a document"... but it isn't. It isn't a unique parchment of which there is only one copy in the entire universe. It's this ephemeral thing, and sometimes you need to say things about that ephemeral thing.

We got this right with Verifiable Credentials. The outermost thing was metadata about the information (metadata about the credential), and the innermost thing was the information itself (the subject(s) of the credential).

I really worry about both Examples, I think they're both wrong.

Example 1 is wrong because it breaks all blockchain-based mechanisms. Submitting Example 1 to Veres One would mean that the DID subject would be setting the created and updated dates, and they have no right to do that. It's the consensus algorithm that decides when entries in the ledger are created and updated.

Example 2 is wrong for the same reason. The DID subject has no right to set the created/updated dates except in the fringe case where they actually control that information (like for did:web).

So, I think the correct solution is this (Example 3):

{
    "@context": "...",
    "type": "DidResolutionResponse",
    "created": "...", // when the DID Resolution was created
    "didCreated": "...", // when the DID identifier was created
    "didUpdated": "...", // when the DID Document was updated
    "didSubject": { // this is what we traditionally call the DID Document
        "id": "did:ex:1234",
        "authentication": [ ... ],
        "service": [ ... ]
    },
    "proof": [ ... ], // proof from the resolver
}

The proposal above (Example 3) is nuanced in its difference from Example 1. It works for did:web and did:v1/did:btcr/did:ethr where Example 1 is very problematic in the latter use cases. Here's how it could work: the did:web Method would state that any file written to a web server MUST be a DID Resolution response. This means that a resolver will hit a did:web method and pull a raw resolution response (that contains a didSubject) from the web server. If a developer just wants the "DID Document", they pull the didSubject field out and give it back to the developer. This creates the proper separation of concerns and doesn't require us to rearchitect what a DID Document is (and frankly, I don't think that would be the right thing to do at this stage anyway).

We do have the authority in the Working Group to specify a data model for "where metadata about the DID Document should go". The trick is doing this w/o opening a massive can of worms that is DID Resolution. So, we have a few options going forward:

State that metadata about a DID Document is out of scope for the DID WG, and it should go in the DID Resolution spec. This one is easy and keeps our scope limited while providing an answer for the did:web folks.
State that the data model for specifying metadata about the DID Document is in scope, but the resolution protocol is out of scope. This one is a slippery slope.

jonnycrunch commented 4 years ago

this puts a lot of power in the resolver.

jonnycrunch commented 4 years ago

also, just to highlight the self-sovereign cryptographic signature that I as the author of the DID data assert the time created and updated, not that it is necessarily convenient to be there.

jandrieu commented 4 years ago

@jonnycrunch IMO, you shouldn't trust a resolver you aren't running any more than a bitcoin node you aren't running, for the same reasons. DLT-based resolution generally requires a full node under the hood. The point of meta-data about DID Document resolution is for a given resolver to provide some level of assurance (mechanism TBD and per-method) for making a trust decision about that result.

The kind of meta-data we are talking about could include just about anything, including identifying information about the resolver, so that one could rely on specific resolvers (either pseudonymous with some notion of reputation or bound to legal entities and their reputations). Another kind of "meta-data" could include the block height of the tip (for BTCR) or even a merkle bloom filter that could be used elsewhere to proof existence on chain of the root of the DID Document. I'm just speculating about these cryptographic assurances, but they are definitely part of the "stack" for deciding whether or not to rely on the result from a given resolver.

msporny commented 4 years ago

The discussion on the WG call today was all over the place, and I think the root cause was because no one, including me, has defined what "created" means. At least six definitions popped up during the discussion today:

The time that the DID subject is asserting they created the DID.
The time that the resolver is asserting that the DID was created.
The time that the ledger consensus algorithm is asserting that the DID was created.
The time that the DID subject is asserting they created the DID Document.
The time that the resolver is asserting that the DID Document was created.
The time that the ledger consensus algorithm is asserting that the DID Document was created.

I think @dmitrizagidulin was talking about either 1 or 4, I was talking about 3 or 6, and I'm not sure which one @peacekeeper was talking about.

Let's go at this from the other direction and get very specific about the items being discussed. I don't think having the conversation in the abstract is helping us. Let's just focus on "created" and all make sure we're talking about the same definition before we start talking about where the data should be stored.

peacekeeper commented 4 years ago

We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource.

As an httpRange-14 nerd, I would say: What is that resource that the DID identifies? The DID subject, right? Well that's not an "information resource", therefore it has no representation that can be retrieved, has no media type, and there is no way to dereference fragments like did:ex:123#keys-1. From an RDF semantics perspective, we treat DIDs like identifiers for the DID subject, but from a URI dereferencing perspective, we treat DIDs like identifiers for the DID document.

I think this is the reason why originally we didn't really mind having properties like "created", "updated", "services", "authentication" side by side without distinction.

dmitrizagidulin commented 4 years ago

@msporny - The point I (and @peacekeeper) was trying to make is not that there are multiple definitions of 'created'. It's that there are multiple timestamps that need to be tracked. Which may include:

The time that the DID subject is asserting they created the DID Document. (And if that particular DID method uses a proof section in DID documents, this assertion will be signed by the creator.)
The time that the resolver has retrieved the document (currently tracked in resolverMetadata.retrieved property of the resolution result).
The time that the registry (ledger or other mechanism) asserts that the DID was registered. (This is method-specific, and would go into the methodMetadata section of the resolution result.)

2 and 3 already have mechanisms in the (DID Resolution) data model. And what we're arguing is that item 1, the self-asserted creation date of the document, belongs in the DID document.

(We were not talking about the timestamp that the DID was created (as a separate entity from the DID Document), because it's not really possible to record or keep track of.)

dmitrizagidulin commented 4 years ago

@msporny I agree with you, btw, that the current property, created, is ambiguous, and should be changed to something like docCreated, to indicate which of the timestamps it refers to.

dmitrizagidulin commented 4 years ago

The other thing that's important to note is that although we want to provide mechanisms to track multiple timestamps, their relevance will differ for each individual DID method.

For example:

For Veres One non-cryptonym type DIDs (did:v1:uuid), a DID document is first created, and then (a nontrivial amount of time later) is registered with the ledger. So, for Veres One:

Time DID Document created - relevant, but not tracked (is not part of the Veres One DID Method data model)
Time DID Document was registered on the ledger - relevant, tracked. Will be returned in the methodMetadata portion of the DID Resolution result.
resolverMetadata.retrieved - relevant, will be provided by the resolver.

For did:key method, the situation is slightly different:

Time DID Document was created - relevant, but not tracked (nowhere to record it, since a did:key document is immutable / deterministically derived from the DID url.
Time DID Document was registered - not relevant / not applicable, since there's no registration.
resolverMetadata.retrieved - relevant but trivial (since the document is immutable / unchanging)

For did:web method:

Time DID Document was created - relevant, and can actually be cryptographically signed (using proof mechanism) by the creator.
Time DID Document was "registered" (for example, written to a particular web server) - possibly relevant? (Will at least be recorded on the file system or in the database, for example). But definitely differs for each web server, whether that information is accessible during resolution.
resolverMetadata.retrieved - relevant, will be provided by the resolver.

Other methods may or may not want to have finer-grained notions of what "registered" meant -- perhaps they track the timestamp for when a DID Document was first submitted to a node separately from when the overall ledger comes to consensus. In which case, all of those method-specific fine grained timestamps will belong in methodMetadata section of the DID Resolution Result (if the method is able to track it).

peacekeeper commented 4 years ago

the did:web Method would state that any file written to a web server MUST be a DID Resolution response. This means that a resolver will hit a did:web method and pull a raw resolution response (that contains a didSubject) from the web server.

My initial reaction is that this feels problematic, since a DID resolution result is constructed on the fly once a DID is resolved, so it doesn't make much sense to author this in advance and store it on a server. But I think I understand your reasoning behind this...

So, we have a few options going forward:

State that metadata about a DID Document is out of scope for the DID WG, and it should go in the DID Resolution spec. This one is easy and keeps our scope limited

I actually like this. As @dmitrizagidulin said, we've been envisioning a methodMetadata property in the DID Resolution spec, which could potentially contain this kind of data.

peacekeeper commented 4 years ago

Regarding the different types of timestamps, I think I have a much simpler approach to this:

The created (or docCreated) property contains the time when the DID's "create" operation was executed, as defined by the DID method.
The updated (or docUpdated) property contains the time when the DID's "update" operation was last executed, as defined by the DID method.

All the other details (asserted and signed by subject, asserted by ledger, created but not yet registered, registered but not yet propagated, etc.) are all method-specific and should not bother us on the DID core spec level. All DID methods need to clearly specify how "create" and "update" work, and based on that the corresponding timestamps should be set in the DID document (or DID resolution result). And if it's not possible to get that timestamp data (e.g. in did:key), then that's fine too.

dmitrizagidulin commented 4 years ago

@peacekeeper I like that approach, nice and elegant.

However, I disagree in one important detail. The self-asserted creation date of the DID Document, signed by the subject, is not method dependent. It's a concept that is the same for all DID methods (though some don't sign their DID Docs, relying on the ledger to do it, and some do). I think that timestamp needs a first-class property to mark it.

dmitrizagidulin commented 4 years ago

That said, although I do think that the self-asserted docCreated timestamp is a concept that means the same thing across all DID methods, if the group consensus is that it's not useful for all methods, I'm content to push it out to each individual DID method spec.

iherman commented 4 years ago

This issue was discussed in a meeting.

No actions or resolutions
View the transcript
Brent Zundel: https://github.com/w3c/did-core/issues/65
Brent Zundel: wants to frame the discussion as finding answers to 2 questions:
… 1) what is DID document metadata
… 2) Where does it go in the DID document?
Dmitri Zagidulin: proposes a third question: if it does belong in the DID document, how do we represent it?
… does it belong in its own subgraph?
… does it go in its own envelope?
Manu Sporny: https://github.com/w3c/did-core/issues/65#issuecomment-558274770
Markus Sabadello: this is a deep topic, which takes us deep into the RDF discussions
… there is some discussion about DID resolution results, and how to control them
… Markus feels those should be controlled by resolution parameters
… but that is not what we are talking about in this issue
… rather we are talking about metadata about the DID, DID subject, etc.
… second point was that there was conflation of identifiers for the DID subject and the DID document
… and that is important from an RDF and semantic web context
… if the DID is an identifier for the DID subject, then how is the DID document identified?
… for example if a fragment is added to the DID, it resolves inside the DID document
… thus the DID document is a Web resource
… this results in conflation of the DID and DID document
… some would call it conflation and some would call it simplification
Manu Sporny: Markus provides two examples here – https://github.com/w3c/did-core/issues/65#issuecomment-558274770
Manu Sporny: Analysis of Markus’ examples and another example – https://github.com/w3c/did-core/issues/65#issuecomment-558838589
Manu Sporny: Markus’ two examples are very helpful
… Manu followed with his analysis
… if folks haven’t read that, we won’t get far on this call today
… first point: we have confused everyone by calling it a DID document
… fundamentally it’s not an RDF “thing” or not. It’s an informational thing.
… the DID identifies a DID subject, and then you can say things about it
… so there is information, and then there is metadata about the information
… so he disagrees with Markus that there is conflation
… Manu believes the DID always identifies the DID subject
… thus if we want metadata about the DID information, we need to make it clear
… the three potential approaches are
… 1) put the metadata on the outside (this was done with verifiable credentials)
… 2) add a meta field (done for proofs in verifiable credentials)
… Manu thinks that’s the wrong approach
… 3) put the metadata in the DID resolution result
… Manu believes that’s where most of the metadata belongs
… it may sound like this is a fairly simple discussion, but it suffers from definitions
Orie Steele: did resolution aligns with document loader concept in json-ld…
Ted Thibodeau Jr: DID identifies the entity which we’ve been calling the DID subject.
Dereferencing the DID gets a description/representation of that entity (the DID subject), which we’ve been calling the DID document.
Metadata about the description might say when that description was updated.
Metadata about the DID document which conveys that description might say when that representation was updated/generated.
*Electronic documents (and their provenance, destiny, etc.) are harder to track than paper documents, but they are no less analogous or important to be describable because of that difficulty.
Drummond Reed: I agree this is a challenging subject.
Manu Sporny: +1 to drummond!
Manu Sporny: yes, exactly
Brent Zundel: +1
Daniel Buchner: jonathan: I addressed your comment on GH Issue 70 just now
Drummond Reed: Principle 1. The did identifies the did subject. Period.
… Principle 2. I like the term did doc. The way we describe it can be muddy. We need to be clear in the spec.
… It is not covered in the community draft.
… From a web standpoint, does it need an identifier or not?
Manu Sporny: I’m agreeing w/ drummond, what he’s saying makes sense to me.
Dmitri Zagidulin: I wanted to clarify what manu was saying.
… The discussion is very much NOT about metadata about the resolution.
Dmitri Zagidulin: https://w3c-ccg.github.io/did-resolution/#example
Dmitri Zagidulin: In example 3, we have a separate resolution structure which separates the meta that is about the document and resoltion.
… this example from the DID resolution spec clearly separates the metadata about the resolution process vs. metadata about the DID document that is method-independent
… we keep retreading the same ground around questions like the date of the DID document. Some say it should be the ledger, but some ledgers don’t keep time
Jonathan Holt: I think that the datecreated is there for convenience. It could be interpolated and self-asserted.
… Some DID methods self-assert the time, whereas others are verifying it via the DID registry
… so in some cases its self-asserted and in some it is interpreted
… the self-assertion is for convenience, but they can also be verified via resolution
Orie Steele: I’m in favor of not mutating the didDocument type with metadata annotations… mostly because it will create integrity / authentication issues with signed documents…
Markus Sabadello: Disagrees with Manu and agrees with Dmitri that metadata that describes the DID document belong inside the DID document
… an example is proofs included within the DID document under the BTCR method
Tobias Looker: Does the metadata conversation need to separate along the lines of who created of the metadata? E.g metadata created by the ledger vs that is asserted by the entity who updated the did?
Dmitri Zagidulin: Orie: absolutely agreed. But we’re only talking about metadata that comes into existence the moment the DID Document is created. That is, it’s metadata that is signed over, it’s protected by the signature
Markus Sabadello: different resolvers may give you different resolution metadata, but the proof metadata should always be the same inside the DID document
… RE the question about whether the DID document has a Web URI, he again points to fragment resolution
… but he’s in favor of not changing anything
Jonathan Holt: tplooker “Turtles all the way down”
Manu Sporny: URIs are opaque, full stop
… the fact that a DID URl starts with the DID subject identifier has nothing to do with being able to point to info inside the DID document
… the point is that you have an identifier that you can use to point inside a graph of information
… Manu is also hearing argument that “it’s convenient”
… arguments for putting in the metadata are “convenient” for methods that don’t have another way to do it
… but Manu believes that there is a cleaner way to add that metadata
… a specific example is “created” or “updated” dates. Do they describe the DID subject or the DID document?
… if the data model is not clear, then implementers will use the same value for different purposes
… let’s not do this. Let’s clearly separate statements about the DID subject, and statements about how the DID document was produced
Orie Steele: +1 to manu’s point… messing with the didDocument alters its type… end up getting a type that is part data about subject / part meta data about identifier…
Manu Sporny: to put it the other way: anything that puts the metadata on a DID ledger, the DID document author cannot tell the ledger about the metadata; the ledger is authoritative.
Dmitri Zagidulin: it’s more than convenient though…
Dmitri Zagidulin: has direct implications to security.
Ted Thibodeau Jr: metadata (DID document modification date) about data (contents of DID document) about entity (DID subject)
Dmitri Zagidulin: wait.. but the resolver document /does/ have that information already.
Markus Sabadello: +1 to continuing this conversation
Samuel Smith: Having a difficult time with this conversation.
… the original purpose of the DID spec is to convey trust over the Internet
… it’s more important than anything else we might do
… so when we start discussing things that make that harder, it confuses us. One step forward, 2 steps back
Manu Sporny: yes, but I don’t think anyone is saying “don’t express created” – we’re discussing where created should go.
Samuel Smith: there are a few things done with DID documents, and those should be very strongly supported cryptographically
… so the test should be can a DID document establish a verifiable root of trust (a “control authority”) and everything else should be secondary
… if we lose sight of that fact, then we will stray from the main purpose of DID documents
Markus Sabadello: Agrees with what Sam said.
… responding to Manu, although URIs are opaque, but we can’t ignore URI rules
… the rules are that to dereference a fragment, you first resolve the primary resource, then you resolve the fragment to a secondary resource
… by that logic, if a fragment identifies a secondary resource in a DID document, then the DID document must be the primary resource
Dmitri Zagidulin: manu: we’re discussing where 3 different ‘created’ timestamps go. 1) when the did document was created, 2) when it was registered on the ledger/whatever persistence mechanism, and 3) when it was resolved. We already have a data model for expressing 2 & 3. So we’re arguing about 1.
Ted Thibodeau Jr: they remain opaque. the lexical characters that comprise the fragment, as well as the rest of the URI, have no inherent meaning.
Manu Sporny: +1 TallTed
Dmitri Zagidulin: Believes that Manu is suggesting that resolution metadata be injected into a DID document.
… there are 3 different timestamps involved: 1) creation, 2) registration, 3) resolution
… we had a place for created-date, but this whole discussion was kicked off by the PR to remove that
… so clarified that we are talking about the creation date of the DID document itself
Michael Jones: Most important is to keep it simple
… the most important thing is to be clear about which properties describe what
… so some claims can describe the subject, but others can describe other things
… an example is the “audience” claim in JWT, which describes who the claim is for
Samuel Smith: The problem is the mental model. Well suited mental models lend themselves to straightforward answers. Poorly suited mental models have convoluted difficult to understand answers. From a cryptographic standpoint 1) We need to establish the current control authority for the DID and DID Doc. 2) Given that control authority we then need to establish if the DID Doc was provided under the current control authority.
Answering this second question requires linking the DID Document with some identifier for the current control authority. This naturally lends itself to some type of version identifier.
Ted Thibodeau Jr: It’s hard to keep it simple because there are a lot of layers here. To keep it precise, we must have some complexity
… prefers the term “Decentralized Identifier” because it’s clearer
… the statements that describe the DID subject can go anywhere
Manu Sporny: YES, +1000 to TallTed !
Ted Thibodeau Jr: when we dereference the DID, we get back some manifestation of those statements
… we need to be able to say:
… 1) they were made at some time
… 2) they were posted at some time
… 3) they were retrieved at some time
… 4) they were forwarded at some time
… what would really help is a picture that sketches out the layers and what goes where
… this might reveal 17 “created-dates”, but they can all be precise
Samuel Smith: So cryptographically how do we establish that given did doc was created by the current control authority. The easy approach is that its signed non-repudiably by the control authority. This means that some identifier within the DID Doc is included in the signature in order to make the linkage non-repudiable. Putting that linkage anywhere else is bad crypto. It doesn’t matter whether or not if fits an RDF model. Bad crypto is bad crypto
Dmitri Zagidulin: there isn’t 17 different created stamps though. There’s only 3.
Samuel Smith: The complexity is we have a poor mental model of what we are trying to accomplish with a DID Doc.
Michael Jones: +1 to what Sam said
Dmitri Zagidulin: +1 to what SamSmith is saying, about a signed non-repudiable ‘didDocumentCreated’ timestamp, as opposed to ‘didDocumentRegisteredOnLedger’ timestamp, or ‘didDocumentWasRetrievedByResolver’ timestamp.
Tobias Looker: Agrees with Manu that we don’t want confusion for implementers.
… agrees with dmitriz that there is utility in certain types of metadata
… and that there are layers of metadata involved with different parts of the DID document creation, registration, and resolution process
Markus Sabadello: no DID method should allow “created” and “updated” to be written arbitrarily by the DID subject
Samuel Smith: Is the data needed inside the DID Doc? If it is it should not be excluded because we can’t classify it as not metadata
Kim Duffy: What Mike Jones said really resonated with me, and I hope we can aspirationally get to the state he described
Brent Zundel: apologies to the rest of the queue as we are out of time
… encourages everyone to continue working in github and to add comments

msporny commented 4 years ago

@dmitrizagidulin and @peacekeeper -- YASS! Now we are getting somewhere...

The point I (and @peacekeeper) was trying to make is not that there are multiple definitions of 'created'. It's that there are multiple timestamps that need to be tracked.

That means there are multiple concepts, and each concept needs a definition. You have outlined at least 3 below:

The time that the DID subject is asserting they created the DID Document. (And if that particular DID method uses a proof section in DID documents, this assertion will be signed by the creator.)

Note that this definition doesn't match the definition of created in the spec right now, and as you said above, the current definition in the document is thoroughly ambiguous.

The time that the resolver has retrieved the document (currently tracked in resolverMetadata.retrieved property of the resolution result).

Ok, so this is out of scope if we're talking about resolver metadata... and we're agreeing not to conflate this value with the one above.

The time that the registry (ledger or other mechanism) asserts that the DID was registered. (This is method-specific, and would go into the methodMetadata section of the resolution result.)

This is also out of scope, since it also has to do with resolver metadata... and again, we're not going to conflate it with the other two concepts.

So, now things get much simpler... we're only talking about a concept whose definition is this:

unnamed created concept - The time that the DID subject is asserting they created the DID Document.

I do agree that that concept could belong in the DID Document and if we're going to define it, it should be the same for all DID Documents. I question whether anyone would want to depend on it (based on @SmithSamuelM and @selfissued's comments during the call), but that's easily discussed.

If that is what you and @peacekeeper are talking about, then the debate on 'created' and 'updated' is this:

unnamed created concept - The time that the DID subject is asserting they created the DID Document.

unnamed updated concept - The time that the DID subject is asserting they updated the DID Document.

... and we should get everyone on the same page that that's what we're debating. Let's stop talking about "metadata" in general, we'll eventually get to a design pattern on that. Let's just focus on these two items and then we can apply whatever lessons learned from discussing these things to other things that loosely fall into the category of metadata.

selfissued commented 4 years ago

An unnamed created concept is useless, as I see it. What would be useful to implementers would be concrete created claims, each with well-defined semantics. These examples may be all wrong, but I could imagine claims like whenDIDCreated, whenDIDRegistered, whenDIDResolved, etc. Those could all have clear, actionable meanings. That's the kind of direction we should go, rather than trying to over-abstract concrete things that can be simple.

TallTed commented 4 years ago

@selfissued - Yes, of course, "An unnamed created concept is useless".

I do not believe that anyone including @msporny is suggesting that we continue working on either the unnamed created concept or the unnamed updated concept as such, but rather that we (1) confirm whether these things which we can describe but have not yet named are the things we are discussing, (2) give them suitable names (one of which might be whenDIDCreated), and (3) proceed...

"Metadata" is a subset of "data". It's only "meta" because it's not the focus at the moment. "Metadata" about a Word .doc is not usually in focus, because it's the content of the .doc that's important -- until you need to arrange several very similar .doc files in order of revision -- and then that "metadata" becomes the (temporary) focus, and thus "data".

"Meta" is not a permanent condition; it is not a defining attribute.

This is why I suggested an illustrative sketch, starting with the kernel which we all agree is in focus (the DID Subject, identified by the DID); then identifying all the data that goes into the description of the DID Subject which is contained in the thing we've been calling the DID Document (i.e., the means of identifying and interacting with the DID Subject); then identifying all the data that describes the DID Document (which might include whether or not a given DID Document is entirely ephemeral, is concretized in some fashion, etc.) and/or describes the statements which are found in the DID Document, and so on.

Useful names for these descriptors may or may not be obvious or unanimously agreed upon -- but problematic names (whether ambiguous as with created, or in disagreement with what they're labeling, or otherwise) will usually quickly become obvious as such.

msporny commented 4 years ago

I do not believe that anyone including @msporny is suggesting that we continue working on either the unnamed created concept or the unnamed updated concept as such, but rather that we (1) confirm whether these things which we can describe but have not yet named are the things we are discussing, (2) give them suitable names (one of which might be whenDIDCreated), and (3) proceed...

Yes, exactly, what @TallTed said - let's get the definitions of the concepts we're discussing locked down... once we do that, naming becomes easy... and once we do that enough times, a design pattern will emerge.

iherman commented 4 years ago

This issue was discussed in a meeting.

No actions or resolutions
View the transcript
meaning of “Created”
Brent Zundel: https://github.com/w3c/did-core/pull/28
Drummond Reed: @selfissued RE the authority component, we have full rights to add syntax there in the DID URI scheme, but not in query parameters
Manu Sporny: https://github.com/w3c/did-core/issues/65
Manu Sporny: wants to try to focus the discussion on what we should talk about in this WG
… there are 3 aspects. The first one is when the DID document was created an updated.
… that topic is being debated in the resolution subgroup that @markus_sabadello is leading
… so the other options are: 1) an assertion on the ledger itself about when the DID document was created/issued
Dmitri Zagidulin: there is already a field for ‘the ledger is asserting the creation date’, separate from ‘created’
Manu Sporny: and 2) the DID subject’s assertion of when the DID document was created/updated
Dmitri Zagidulin: so we’re only talking about 2
Manu Sporny: manu believes that it is the latter
Dave Longley: wouldn’t it be when the DID subject was created?
Oliver Terbu: We should also consider that some did methods do not have an easy way to assert created/updated time
Dmitri Zagidulin: we are not talking about the ‘ledger is asserting the DID was created’ date, because that’s a different parameter
… so the DID document contains: 1) the DID, 2) DID metadata, 3) resolution metadata
Jonathan Holt: For specific DID methods, you must refer to the ledger, so the only option is the author’s assertion
Markus Sabadello: Notes that “created” and “updated” correspond to two operations required by a DID method
… so it’s okay that the DID document contains the author-asserted timestamps and resolution produces the DID registry-based timestamps
Joe Andrieu: wants to comment on terminology
Oliver Terbu: some did methods are not ledger-based but there are some did methods such as did:ethr that are ledger-based but don’t require a ledger to be created. so, no information can be put into the did document for the creation time.
Dave Longley: agree with Joe … a DID document is controlled by a DID controller and says things about the DID subject
Joe Andrieu: it’s not the “DID subject” that sets the date, it’s the “DID controller”
Markus Sabadello: +1 to JoeAndrieu , we should have all said controller instead of subject
Joe Andrieu: wants to make sure we’re using the right language
Brent Zundel: seeing wide agreement that the datestamps in the DID document are asserted by the DID controller
Jonathan Holt: probably even more specific would be the DID controller software
Manu Sporny: agrees with Joe on terminology
… so the question is: security issues around using datestamps asserted by the DID controller
… so Manu would like to understand why self-asserted datestamps are important
Brent Zundel: so this moves us to the topic that if dates are DID-controller-asserted, are they needed?
Dave Longley: If the timestamps are in a DID document, they should be authoritative.
Ivan Herman: +1 to dlongley
Orie Steele: +1 to dlongley
Dave Longley: so they should be describing the DID subject, not the DID document
Oliver Terbu: -1
Dave Longley: “DID subject createdDocument date” would make more sense
Dave Longley: but still messy.
Dmitri Zagidulin: the name of the property is ambiguous, so the proposal is to change it to didDocumentCreated and didDocumentUpdated
Orie Steele: why?
Brent Zundel: so the question is now whether this is needed
Oliver Terbu: -1 because some ppl don’t create the did document at the created time
Orie Steele: what is the use case for knowing when the document was created, by looking at the document?
Oliver Terbu: some did methods don’t store the did doc on a ledger
Dmitri Zagidulin: oliver - I’m not sure I understand
Dmitri Zagidulin: oliver - the timestamp is not about the ledger. it’s only about when the document was created (the ‘created on the ledger’ or whatever is a separate field)
Jonathan Holt: asks if PGP keys in the web-of-trust are self-asserted with regard to datestamps?
Brent Zundel: is anyone a PGP key expert that can weigh in?
Orie Steele: I’m not sure anything PGP does regarding embedding meta data like this is good.
Brent Zundel: brent has only been to one PGP key party
Jonathan Holt: looking at his own PGP keys, he attests to their datestamps, so he wants to get the same info about others
Manu Sporny: PGP does allow you to assert datastamps for keys, but that does not make for good security
Orie Steele: +1 to manu, there is no solid use case.
Oliver Terbu: dmitriz - let’s have this discussion in a github issue
Daniel Buchner: Seems like added complexity for little benefit
Samuel Smith: q
Yancy Ribbens: +1 to manu
Manu Sporny: but we are not hearing a solid use case on why we would need “createdDocument” and “updatedDocument” as asserted by the DID controller
Daniel Buchner: one can make an argument for throwing tons of things in the docs, I think we should be super stingy about it
Jonathan Holt: can look at his own keys and the timestamps associated with each
… even though it is self-asserted timestamp, it is still a way of keeping order
Daniel Buchner: why can’t you save them with an added stamp post resolution, when you save it to your machine
Manu Sporny: agree with daniel – we should be very careful about the data we standardize in DID Document.
Dave Longley: agree with daniel/manu so far.
Samuel Smith: from a security perspective, the only way the DID controller can make an authoritative assertion about the DID document is to link it to a hash of the event that existed at that point in time
… any other ordering can be self-asserted, forged, etc.
… so that would be the cryptographic way of verifying the authoritative ordering
Yancy Ribbens: blockheight can determine time ordering?
Dave Longley: what is “authoritative” is up to the DID method, but agree we should be using crypto here to extend trust
Dave Longley: (when we can)
Manu Sporny: agrees with Sam. If a createdDocument property is in the DID document, it needs to be cryptographically verifiable
Orie Steele: you can always just embed a PGP key in your did document which contains this (and other) meta data.
Dave Longley: +1 to Orie
Joe Andrieu: @orie that would only cover the key, not the entire document
Dmitri Zagidulin: I’m totally fine with pushing down the ‘documentCreated’ metadata field down to method-specific land.
Manu Sporny: the case of just self-assertion of metadata about a DID document for the author can be handled in other ways
Yancy Ribbens: +1 to metadata not going in diddoc
Manu Sporny: but all of those ways are outside of a DID document
Daniel Buchner: or use Epoc seconds as your IDs in the values
Manu Sporny: so Manu is not hearing a strong use case for having these properties in a DID document
Daniel Buchner: epoch*
Manu Sporny: and believes we should take them out of the spec until there is a strong use case
Dave Longley: if you want to express when you registered a DID with some registry – you can add that to your DID doc perhaps under a “registration” property with more information about what you registered … i would think if you have anchored your DID to other blockchains you’d want something like that anyway for more than just one ledger/blockchain
Manu Sporny: daniel, yes, that would be a fine thing to do… epoch in id itself, and hopefully, asserted by the ledger.
Dmitri Zagidulin: dlongley: there’s a separate field for when it was registered, tho
Joe Andrieu: believes that it’s too early to call for lack of use cases when the conversation is just getting started
Dave Longley: dmitriz: this is about the DID controller saying something
Joe Andrieu: there are lots of examples of claims that are not directly tied to the subject
Daniel Buchner: The DID controller saying something is not substrate-authoritative
Joe Andrieu: this is the only place where the DID controller can assert these properties for purposes of redundancy and sniff tests
Orie Steele: SamSmith this is an attempt to formalize inception event
Dave Longley: so it seems the cases here are entirely around self-assertion, in which case such an “anchor” should have
Daniel Buchner: it’s speculative data, not something remotely falsifiable
Dave Longley: semantics that make it clear that its self-asserted info from the DID controller
Joe Andrieu: +1 to more specific property
Manu Sporny: +1 to daniel
Irene Adamski: @manu: unrelated to current discussion - who can I contact about more infos on the f2f in Amsterdam?
Justin Richer: the fundamental question is whether the DID document should be internally self-describing
Daniel Buchner: If it’s just an assertion by a party external to the trust-minimizing anchoring system, then there’s 0 reason we can’t make that a separate witness statement outside of the doc
Dmitri Zagidulin: great way to put it, Justin
Justin Richer: there are others seeing the DID document will be in a context that has other metadata about it
Daniel Buchner: if you want it bound to a doc revision, we could just allow a single prop for witness data that can be included via hash reference
Justin Richer: so these two views need to be reconciled
Dmitri Zagidulin: agrees with Justin_R
Joe Andrieu: @daniel, it’s a statement by the controller, just like everything else in the DID Document. All of which lack falsifiability.
Daniel Buchner: That way you can stuff a freaking DVD worth of witness data in there if you want, just via hash link
Dmitri Zagidulin: Dmitri believes that DID documents need to be self-describing
… but if it is self-asserted, then it can be kicked down to DID method specs
Daniel Buchner: Can someone argue against the hash-link approach?
Dmitri Zagidulin: (also standalone, yes! has implications on portability / exportability of the DID)
Jonathan Holt: conflating the need for a DID registry to add the security around the created or updated timestamp is a different issue
Dave Longley: a DID document has information about the DID subject … if there will be “self-describing” information, then the DID controller should state that the DID subject has a DID Document with properties X, Y, Z.
Jonathan Holt: but for a DID document to be able to standalone and self-describing is still important
… there is probably a room for a compromise, where the property is optional
Daniel Buchner: { witness: HASH_OF_N_OTHER_FIELDS/DATA }
Jonathan Holt: what jonathan currently does is using Open Timestamps
… so thinks there is room to compromise
Samuel Smith: +1 to self describing. The DID Controller needs to be able to make authoritative statements in the DID Doc. But I think that it may be method specific
Dmitri Zagidulin: +1 to manu’s proposal.
Joe Andrieu: -1
Dave Longley: we should remember that the root subject of the DID document is the DID subject
Manu Sporny: to speak to that compromise, he proposes to take it out of the core DID spec, and let method specs decide about how to use it
Yancy Ribbens: +1
Daniel Buchner: +42
Dmitri Zagidulin: daniel - that’s + too many :)
Manu Sporny: however there is already dissent, so no action on that proposal yet
Dave Longley: so something must say “DID subject has a DID document”, and introduce a new object with properties about that DID document.
Brent Zundel: closes the call with thanks to the scribes
Dave Longley: (so there’s no confusion over what the properties apply to DID subject vs. DID doc)

peacekeeper commented 4 years ago

At the Amsterdam F2F meeting in January 2020, @gannan08 ran a session on this topic (see slides).

We then started a document to collect (meta-)data items related to DIDs and DID documents.

The next steps are:

Propose more items in that document (please everybody add items you think are missing!)
Then decide what "buckets" or "types" of (meta-)data we will have.
Then decide where they will go (e.g. DID document, DID resolution result, new to-be-invented data structure, etc.).

burnburn commented 4 years ago

Chairs set a 2 week deadline on the document from today after which we can move to the next step.

jricher commented 4 years ago