w3c / did-core

W3C Decentralized Identifier Specification v1.0

https://www.w3.org/TR/did-core/

Other

410 stars 97 forks source link

DID Doc Encoding: Abstract Data Model in JSON #128

Closed SmithSamuelM closed 4 years ago

SmithSamuelM commented 5 years ago

DID Doc Encoding: Abstract Data Model in JSON

This is a proposal to simplify DID-Docs by defining a simple abstract data model in JSON and then permitting other encodings such as JSON-LD, CBOR, etc. This would eliminate an explicit dependency on the RDF data model.

Universal Adoptability

For universal interoperability, DIDS and DID-Docs need to follow standard representations. One goal of the DID specification is to achieve universal adoption. Broad adoption is fostered by using familiar representations or encodings for the DID and DID Doc. The DID syntax itself is derived from the widely adopted and highly familiar URI/URL identifier syntax. This takes advantage not only of familiarity but also the tooling built up around that syntax. Likewise greater adoption is fostered to the degree that the DID Doc representation or encoding uses a familiar widely adopted representation with extant tooling.

The only reason not to use a highly familiar representation is if the requirements for representation demand or greatly benefit from a less familiar representation. The appendix at the end of this document provides some detail about the main purposes of a DID Doc. This shows that a complex representation is not required and may not be beneficial.

In addition, having only a single representation or encoding, albeit highly familiar and widely adopted, may be insufficient to achieve universal adoption. It may require multiple representations or encodings.

Multiple encodings require a standard base encoding from which they may be derived. Or in other words the least common denominator from which other encodings may be derived.

One way to accomplish this is to use an abstract data model as the standard encoding and then allow for other encodings. This was proposed in the following issue: https://github.com/w3c/did-core/issues/103#issuecomment-553532359

The problem with an abstract data model is that the syntax is expressed in some abstract modeling language, typically a kind of pseudo code. Pseudo code is usually less familiar than real code. This means that even in the major case the spec is written in a language that is unfamiliar. This runs counter to fostering broader adoption. A solution to this problem is to pick a real language encoding for the abstract data model that then provides both an abstracted standard encoding that other encodings can more easily be derived from and also provides the lowest common denominator standard encoding.

Clearly given the web roots of the DID syntax itself as a derivation of URL syntax, JSON's web roots would make it the ideal candidate for an abstract data model language. Of any encoding available, JSON is the closest to a universally adopted encoding. JSON is simple but has sufficient expressive power to model the important data elements needed. It is therefore a sufficient encoding. Annotated JSON could be used to model additional data types such as an ordered mapping (in the event that they are needed). Many of the related standards popular among implementors such as the JWT standards are based on JSON. Casual conversations with many others in the community seem to suggest that a super majority of implementors would support JSON as the standard encoding for the combined abstract data model and default encoding.

Given JSON's rampant familiarity, it should not pose a barrier to implementors of other optional encodings such as JSON-LD or CBOR. Compared to pseudo-code It should be just as easy if not easier to translate JSON to another encoding.

The Elephant in the Room

The result of this proposal would be to make JSON the standard encoding for the DID Doc specification and demote JSON-LD to be an optional encoding. The current DID spec uses JSON-LD as the preferred encoding but does not prevent the use of naive JSON as an encoding. However the DID spec mandates JSON-LD elements that show up as artifacts when using JSON that a JSON implementer must handle specially. Moreover, the semantics of JSON-LD are much more restrictive than JSON. This results in a lot of time being expended unproductively in community meetings discussing the often highly arcane and non-obvious details of JSON-LD syntax and semantics. The community is largely unfamiliar with JSON-LD. It is clear that JSON is sufficient to accomplish the main purposes of the DID Doc. Although JSON-LD may provide some advantages in some cases, its extra complexity runs counter to the goal of fostering more universal adoption. This proposal does not exclude JSON-LD but would encapsulate and isolate discussion about the esoteric syntax and semantics of JSON-LD to that subset of the community that really wants JSON-LD. Each optional encoding including JSON-LD would have a companion specification to the DID spec that defines how to implement that encoding. This structure will make it easier to implement other encodings in the future because JSON is much closer to a lowest common denominator data model than JSON-LD.

The relevant questions up for decision are:

Is JSON a sufficient encoding for the purpose of DID Docs ?
Would JSON foster greater adoption than some other encoding such as JSON-LD ?
Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

The purpose of this proposal is not to debate the general good and bad of JSON-LD and RDF. There is much good in JSON-LD for many applications. But, relevant here is that JSON-LD is not as well aligned as JSON with the goal of fostering universal adoption. More specifically the RDF model employed by JSON-LD complicates the implementation of other encodings that do not share the RDF data model and RDF semantics. JSON does not suffer from this complication. This complication has the deleterious effect of slowing adoption.

Appendix

Purpose of DID-Doc

The current DID specification includes a specification for a DID Document (DID-Doc). The main purpose of the DID-Doc is to provide information needed to use the associated DID in an authoritative way.

A distinguishing feature of a DID (Decentralized Identifier) is that the controller (entity) of the DID obtains and maintains its control authority over that DID using a decentralized root of trust. Typically this is self-derived from the entropy in a random number (expressed as collision resistance) that is then used to create a cryptographic public/private key pair. When the identifier is universally uniquely derived from this entropy then the identifier has the property of self-certifiability. Another somewhat less decentralized root of trust for an identifier is a public ledger or registry with decentralized governance.

In any event, a more-or-less decentralized root of trust only has value if other entities recognize and respect that root of trust. Hence portable interoperable decentralized identifiers must be based on an interoperable standard representation. Hence the DID standard.

In contrast, "administrative" identifiers obtain and maintain their control authority from a centralized administrative entity. This control authority is not derived from the entropy in a random number. This statement may be confusing to some because administrative identifiers often use cryptographic public/private key pairs. To explain, PKI with public/private key pairs and cryptographic digital signatures enables the conveyance of control authority via signed non-repudiable attestations. But the source of that control authority may or may not be decentralized. Thus an administrative entity may convey trust via PKI (public/private keys pairs) but does not derive its control authority therein. Whereas a decentralized entity may derive its control authority over a DID solely from the entropy in the random seed used to generate the private key in a PKI public/private key pair.

A key technology under pining DIDs is cryptographic signatures by which the control authority over the associated DID and affiliated resources may be verified by any user of the DID. In contrast an administrative identifier always has, as a last recourse, appeal to the authority of the administrative entity and to whatever means that authority is established.

Indeed, given the foregoing explanation, the most important task facing a user of a DID is to cryptographically verify control authority over the DID so that the user may then further cryptographically verify any attestations of the controller (entity) about the DID itself and/or affiliated resources. The verifications must be cryptographic because, with a decentralized root of trust, the original control authority was established cryptographically and the conveyance of that control authority may only be verified cryptographically. With DIDs it's cryptographic verification all the way down.

From this insight we can recognize that a DID-Doc should support a primary purpose and a secondary purpose as follows:

Primary: Aid the user in cryptographically verifying the current control authority over the DID.
Secondary: Aid the user in discovering and verifying anything else affiliated with the DID based on the current control authority.

If the user cannot determine the current control authority over the DID then the information in the DID Doc cannot be authoritatively cryptographically verified. Consequently, absent verified control authority, any use of the DID Doc for any purpose whatsoever is at best problematic.

Process Model for Establishing Cryptographic Control Authority

As mentioned above a fully decentralized identifier is self-certifiable. Other partially decentralized identifiers may be created on a ledger or registry with decentralized governance. The first case is the most important from a process model point of view. The second case is less informative.

The root of trust in a self-certifying identifier is the entropy used to created a universally unique random number or seed. Sufficient entropy ensures that the random seed is unpredictable (collision resistant) to a degree that exceeds the computational capability of any potential exploiter for some significant amount of time. Currently 128 bits of entropy is considered sufficient.

That random seed is then converted to a private key for a given cryptographic digital signature scheme. Through a one-way function, that private key is used to produce a public key. The simplest form of self-certifying identifier includes that public key in the identifier itself. Often the identifier syntax enables it to become a self-certifying name-space where the public key is used as a prefix to a family of identifiers. Any attestation signed with the private key may be verified with the public key. Because of its universal collision resistance no other identifier may be associated with a verifiable attestation. This makes the identifier self-certifying.

Furthermore, instead of the public key itself the identifier may include a fingerprint of the public key. In order to preserve the cryptographic strength of the root of trust in the random seed, the fingerprint must have comparable collision resistance to the original random seed. The application of further one-way functions can be applied successively to produce successive derived fingerprints. This is similar to how hierarchically deterministic key chains are generated. To restate, a one-way function may be applied to the public key producing a derived fingerprint and then another to that fingerprint and so one. The collision resistance must be maintained across each application of a one-way function.

Instead of merely deriving a simple fingerprint, one could take the public key and use it as a public seed that when combined with some other data may be transformed with a one-way function (such as a hash) to produce yet another fingerprint. As long as the process of creation of any derived fingerprint may be ascribed universally uniquely to the originating public/private key pair, the resultant derived identifier may be uniquely associated with attestations signed with the private key and verifiable with the public key. This makes the eventually derived identifier also self-certifiable.

Rotation

The problem is that over time any public/private key pair used to sign attestations becomes weakened due to exposure via that usage. In addition, a given digital signature scheme may become weak due to a combination of increased compute power and better exploit algorithms. Thus to preserve cryptographic control of the identifier in the face of exposure, the originating public/private key may need to be rotated to a new key pair. In this case the identifier is not changed, only the public/private key pair that is authoritative for the identifier is changed. This provides continuity of the identifier under changes in control of the identifier. This poses a problem for verification because there is no longer any apparent connection between the newly authoritative public/private key pair and the identifier. That connection must be established by a rotation operation that is signed by the previously authoritative private key. The signed attestation that is the signed rotation operation transfers authoritative control from one key pair to another. Each successive rotation operation performs a transfer of control.

State Machine Model of Control Authority

To summarize, control authority over a decentralized identifier is originally established though a self-certification process that uniquely associates an identifier with a public/private key pair. Successive signed rotation operations may be then used to transfer that control authority to a sequence of public/private key pairs. The current control authority at any time may be established by starting at the originating key pair and then applying the successive rotation operations in order. Each operation is verified via its cryptographic signature.

The process and data model for this is a state machine. In a state machine there is a current state, an input event and a resultant next state determined by state transition rules. Given an initial state, a set of state transition rules, replaying a sequence of events will always result in the same terminal or current state. This is a simple unambiguous process model. The data model is also simple. It must describe the state and the input events. There is no other data needed. The state is unambiguously and completely determined by the initial state, the transition rules and events. No other context or inference is needed. A simple representation will suffice.

Once the current control authority for a DID has been established to be a given key pair (or key pairs) then any other information affiliated with that DID may be cryptographically verified via a signed attestation using the current key pair(s). The important information needed to establish the authoritative stature of any additional information such as encryption keys or service endpoints is the current authoritative signing key pair(s) for the identifier and that the version of the information in the DID Doc is sourced from the controlling entity of the current key pair(s). This means the DID Doc may benefit from an internal identifier that corresponds to the latest rotation event that establishes the current key pair(s) or some other identifier that associates the DID Doc with specific signing key pair(s). This process of first establishing authoritative key pair(s) greatly simplifies the cryptographic establishment of all the other data.

There are various mechanisms that may be employed to maintain the state and associated event sequence. These could be as simple as a set of servers with immutable logs for the events/states that also run the code for the state transition logic. A more complex setup might rely on a distributed consensus ledger to maintain the state.

The DID Doc in and of itself, however, is insufficient to fully establish the current authoritative key pair(s). Other infrastructure is required. Merely including a set of rotation events in a DID Doc only establishes control authority up to the latest included rotation event. But other rotation events may have happened since that version of the DID Doc was created. Consequently a DID Doc's main role in this respect it to help a user discover the mechanisms used to establish current control authority. This must be done with some care because in a sense the DID Doc is bootstrapping discovery of the authority by which one may trust the discovery provided in the DID Doc. Nonetheless in order to be authoritative, the other information in the DID Doc that is not part of discovering authoritative control does not need an event history but merely a version identifier linking it to the authoritative key pair(s) and an attached authoritative signature from the current authoritative key pair(s).

In other words the DID Doc is used to bootstrap discovery of the current authoritative controlling keys and then to provide authoritative versioned discovery of affiliated information.

RDF Complications

The RDF model uses triples to canonicalize a directed graph. This graph may be used to make inferences about data. This model attaches a context to a given DID Doc that must be verified as authoritative. This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome. This forces a particular potentially unnecessarily more complex-methodology on implementing versioned discovery documents or evented state machines than what might be the easiest or most convenient for the implementer.

talltree commented 5 years ago

@SmithSamuelM Thanks for posting such an exhaustive case for an abstract data model. This was the issue I raised in #103, and I think this largely supersedes that thread.

Although I originally proposed defining the abstract data model using a modeling language like UML, I am persuaded by your argument that doing it in a simple, universal encoding like JSON will make it more approachable to developers and thus better for adoption.

I fully understand that this is ripping off the bandaid on the tension between JSON and JSON-LD for DID documents. Given how low DIDs and DID documents are in the trust infrastructure stack, I am heavily in favor of "the simplest thing that could possibly work"—above all because of the need for this layer to be as rock-solid as possible from a security standpoint.

dhh1128 commented 5 years ago

I am in favor of this proposal. While I recognize that JSON-LD provides some expressive power that ordinary JSON does not, I think the cost-vs-benefit for that expressive power is not a good tradeoff at the DID level. DID docs should be simple; they are a foundation that many things build on, and should not introduce onerous dependencies. Developers shouldn't have to learn JSON-LD to process DID docs.

I think the case for the expressive power of semantic-web-style constructs like RDF/JSON-LD is stronger at the VC level than at the DID doc level.

SmithSamuelM commented 5 years ago

From a requirements perspective the simplest necessary and sufficient representation should be preferred over any unnecessary but sufficient representation especially if the later is more complex than the former. This proposal does not forbid the later but merely enables the former.

OR13 commented 5 years ago

I'm very concerned that we don't have good examples of DID Methods that don't use JSON-LD at all. So people who don't understand JSON-LD, just kind of hack around it which leads to weakening JSON-LD...

I'm happy to help clarify this issue.

I think we need to provide some clear examples for how to use DID Core spec without JSON-LD and with it, and how to not muddy the waters, and improve the security understanding for either decision.

To be clear, I'm actually a huge fan of JSON-LD, and intend to keep using it with did:elem... I just want to be able to explain better to those who don't want to use it, how to do so in a way that protects JSON-LD and the implementers.

...to actually address this issue proposal directly

I'm in favor of this proposal, and I would like to see JSON-Schema used to help provide better clarity on what is and isn't allowed.

iherman commented 5 years ago

My question is: do we intend to define a DID document exhaustively, i.e., will we define all keys (terms) that can be used in a DID document, or do we envisage that other actors (methods, applications, controller, whatever) may add keys to a DID document that are not defined in this spec?

The power of JSON-LD comes if we allow for the latter. On the other hand, if we want to define all possible keys a DID document may contain then the advantages of using JSON-LD becomes a question.

SmithSamuelM commented 5 years ago

@iherman I don't believe that we need to exhaustively define all keys up front. JSON is an extensible self-documenting data format that supports hierarchical mapping constructs. This makes it possible to discover extended content. The NoSql database world is filled with examples of document oriented databases where this is a standard practice. The RDF construct imbues a specific semantic that many find useful especially if one is building a graph databases but a graph database is not necessary to provide extensibility especially at the low level where DID Docs operate. Verifiable Credentials on the other hand are a different story. But my concern is that RDF has become a greedy paradigm that at least for the DID spec has resulted in unwarranted complexity and moreover due to its unfamiliarity causes unproductive confusion. This proposal does not preclude a JSON-LD implementation, it merely facilitates a specification that does not have the RDF data model as a dependency in order to better foster universal adoption.

SmithSamuelM commented 5 years ago

@OR13

+1 Exactly. I think this is the next step. In many previous attempts to do this we have become bogged down by the complications of the "right way" to do this in RDF as opposed to not using RDF as the mental model. IMHO given the primary purposes of a DID Doc outlined above, the cryptographic considerations are paramount.

dlongley commented 5 years ago

@iherman,

My question is: do we intend to define a DID document exhaustively, i.e., will we define all keys (terms) that can be used in a DID document, or do we envisage that other actors (methods, applications, controller, whatever) may add keys to a DID document that are not defined in this spec?

The power of JSON-LD comes if we allow for the latter. On the other hand, if we want to define all possible keys a DID document may contain then the advantages of using JSON-LD becomes a question.

I think we absolutely intend to do the former. This is self-sovereign technology with an aim at decentralized extensibility. I disagree with the premise that there is a "lot of time being expended unproductively in community meetings" on this subject. I would also argue that we will spend significantly more time rewriting/reinventing the parts of the JSON-LD standard that we're using here to accomplish the same goals. Either that, or we will have to head in an entirely different direction, and start assuming we know everything about how things should work and close off innovation at the edges. In other words, while I think this proposal is well intentioned, I suspect, if we were to adopt it, the outcome would be a need to duplicate significant complexity into our own spec instead of relying upon the work others have already put in (and that has already been standardized). All of this would also come at the cost of interoperability.

I don't think people realize all of the benefits we're getting from piggybacking on top of JSON-LD (e.g., SS/decentralized extensibility, generic data model that can be understood by tools (already) written once, ability to reference objects in the data model by ID using an existing standard, hash resolution rules, and more would come to light as we painfully discover what we've lost....). Taking any other approach will be necessarily closed world or a reinvention of the wheel. Furthermore, I think our spec already insufficiently expresses all of the things we're assuming work a certain way and we're working hard to improve this. To cut out the layers it depends on would only increase this burden as the benefits we assumed we had slip away.

dlongley commented 5 years ago

@SmithSamuelM,

I don't believe that we need to exhaustively define all keys up front. JSON is an extensible self-documenting data format that supports hierarchical mapping constructs. This makes it possible to discover extended content. The NoSql database world is filled with examples of document oriented databases where this is a standard practice.

This is all siloed data that cannot be combined with anything else. This is exactly what we want to avoid and exactly why having a more generic data model that expresses relationships is useful for decentralized extensibility.

SmithSamuelM commented 5 years ago

@dlongley

There is much value in what has already been done. This need not and should not be discarded. The problem is that the full syntax and semantics of the RDF model are not replicable in other encodings, at least not without major effort. Consequently we want just the good stuff. The essential constructs that are both valuable and universally applicable. An abstract data model does this and what is proposed is that this abstract data model be expressed in JSON. It certainly can have the "right" semantics that may be essentially the same as JSON-LD without requiring all that JSON-LD requires. This makes it not siloed. Siloing is not the same as not using JSON-LD. Any standard representation with agreed upon syntax and semantics is not siloed. An extensible hierarchical mapping data construct is perfectly adequate for expressing interoperable semantics. The process of defining those semantics is important. This allows for extensibility over time. Attempting to canonicalize a universal data graph up front is a difficult if not impossible task and is one reason not to be drawn into an RDF approach.

jandrieu commented 5 years ago

@iherman hits the vital point. If we want a DID Document to be extensible without namespace conflicts, we need JSON-LD (or its equivalent). If we want to define a concise and limited set of specific properties that define a DID Document, JSON alone is fine.

There may be other JSON-LD features we'd lose (I seem to recall something about language-specific things like character order), but it is the extensibility that appears to be the most significant.

One thing I keep seeing as a point of confusion from advocates of JSON is that UNLESS someone exercises extensibility, JSON-LD is JSON. So all of the tools and practices for a fixed-schema JSON work just fine with an un-extended JSON-LD serialization. As long as the context is unchanged and the JSON properties are of the constrained set, then you can treat JSON-LD as JSON. It is only when the document is extended that you need to evaluate the contexts. Which is exactly when JSON alone runs into trouble.

That makes the real question the one I started with. Is extensibility important?

@SmithSamuelM's last comment came in as I wrote this and I'm not sure how to interpret his comments on extensibility. No one is proposing a universal data graph up front. Certainly not the JSON-LD advocates. The point of advocacy is an open world data model where extensibility is afforded from the start. "The process of defining those semantics" sounds like you mean that a DID v2 could extend the specification. Yes, that's true, but you could only do so through testing non-compliant implementations unless you start out with an extensible serialization. It is that limited definition of properties implied by JSON only, that I believe @dlongley means by siloed.

dlongley commented 5 years ago

@SmithSamuelM,

The problem is that the full syntax and semantics of the RDF model are not replicable in other encodings, at least not without major effort. Consequently we want just the good stuff.

You can encode the RDF model in JSON (this is what JSON-LD is) -- and the argument here is to use JSON. JSON-LD is JSON. Could you provide a concrete example of the problem you're highlighting?

An abstract data model does this and what is proposed is that this abstract data model be expressed in JSON. It certainly can have the "right" semantics that may be essentially the same as JSON-LD without requiring all that JSON-LD requires.

My reading of this is exactly what we want and already have... but it translates to: use JSON-LD and keep the core simple for JSON-only consumers. Someone treating JSON-LD as any other JSON (unless they want to use the extensibility features) shouldn't notice any difference. This is the same approach we took with VC with success.

Attempting to canonicalize a universal data graph up front is a difficult if not impossible task and is one reason not to be drawn into an RDF approach.

There are libraries to do this and specs in the works for future standardization (Note: I don't think we say anywhere that you must do this anyway). I don't think this is a strong reason to avoid the approach, especially given the other benefits we get from it. But, again, I feel like we are already where we need to be with respect to getting extensibility from JSON-LD/RDF and simplicity from JSON.

peacekeeper commented 5 years ago

I see this more from a philosophical perspective than from a practical one. I don't think it's super hard to process JSON-LD if you only have plain JSON tools and knowledge, and vice versa I don't think JSON-LD provides that much extra needed functionality for DID documents that can't also be done with plain JSON. So in terms of how hard, or secure, or extensible it is, I think it doesn't matter that much.

For me the main purpose of DIDs is to try and model digital identity in a way that approximates as much as possible how identity works in the physical world. This is why I'm a big fan of @SmithSamuelM 's KERI work, where the root of trust is entropy alone which is available to everyone without dependency on anything else.

This also means that for me, the question of data format of the DID document is primarily about describing who you are in the digital world, and how to interact with you. This is also why it's important to talk about metadata about the DID subject vs metadata about the DID document (https://github.com/w3c/did-core/issues/65), about httpRange-14, and similar very theoretical topics.

From this perspective, I believe a description of (the core of) my physical identity in the digital world can be more appropriately done with a semantic RDF graph model, than with a plain JSON object tree of keys and values. So I like JSON-LD DID documents better than plain JSON DID documents. I believe getting these conceptual foundations right is more important than mass adoption.

I'm also in favor of describing the data model in an abstract way and then allowing different formats such as JSON-LD, plain JSON, CBOR, XML (https://github.com/w3c/did-core/issues/103). But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore.

dmitrizagidulin commented 5 years ago

But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore.

+1 to that. A JSON serialization is a very concrete one, not abstract.

jandrieu commented 5 years ago

@peacekeeper Unfortunately, I disagree in the strongest terms with these two statements:

the DID document is primarily about describing who you are in the digital world

and

a description of (the core of) my physical identity ...

This is the wrong mental framing.

If you see DID Documents as about the Subject, you are creating a privacy nightmare. Period.

DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction.

DIDs are NOT your physical identity--online or off. They are a means to communicate with a counter party how to bootstrap secure interactions. I give you a DID and, in theory, I'm giving you a way to interact with the Subject. That's it. FULL STOP.

DIDs should never be tied to a specific person, because that can change. Yes. If you didn't get that, you need to understand that a given DID's Subject can change from one physical person to another. If that's outside the scope of what you have imagined so far, simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time, it actually doesn't refer to any specific person at this moment in November 2019. Sometime in the next decade or two, it almost certainly will. And that is completely independent of whatever might be recorded in a DID Document.

Similarly, DID Documents should never contain information about a specific person other than that which enables specific secure interaction. I've made this argument already. Imagining the DID Document as about the Subject, without filter, will absolutely create privacy harms. Real ones. And when we achieve the scope of ambition we have for these identifiers, those harms will escalate to loss of liberty and even life. Don't imagine for a minute that privacy leaking DID Documents won't eventually kill someone.

This is EXACTLY why many definitions of "persistence" as a goal for DIDs is flat out wrong.

I've voiced this before and I'll voice it until my dying breath.

DIDs are intentionally, and should always be, a fundamental separation of concerns between the physical and the digital. Framing it any other way paves the path for exceptional abuses of this technology.

peacekeeper commented 5 years ago

@jandrieu

DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction.

Agreed, but those means of secure interaction can still be considered statements about the person, semantically not so different from saying what your name or address is. I am not saying DID documents should contain any more than the minimum amount of information for secure interaction, but semantically, DIDs are still identifiers for the DID subject, They are more than just something like an IP address for reaching the DID subject.

At least that's my own personal perspective, I won't insist on it strongly. I can also understand the arguments for simple, constrained, robust, plain JSON documents that are similar to DNS records, and that fulfill their well-defined purpose on a lower, separate layer than the actual "identity layer" that establishes your digital self.

DIDs should never be tied to a specific person, because that can change. [..] simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time,

Are you suggesting we drop the "persistence" principle of DIDs? How would you be able to cryptographically prove that control of the DID has been transferred to the new King? The traditional thinking has been that in this case, the new King would have have a different DID than the old King. The old King's Verifiable Credential would get revoked, and a new Verifiable Credential would get issued to the new King.

Don't imagine for a minute that privacy leaking DID Documents won't eventually kill someone.

Agreed that it's super important to avoid privacy leaking DID documents. I think I could also argue that if the DID is only seen as a lookup key for some technical metadata, not as the root of your digital existence, then that wouldn't fully set you free and make you "self-sovereign". But I can also understand your view, see above. I am not sure if there's any contradiction here, does the question of the DID document format have anything to do with the goal of avoiding privacy leaking data in those documents?

dhh1128 commented 5 years ago

I can also understand the arguments for simple, constrained, robust, plain JSON documents that are similar to DNS records, and that fulfill their well-defined purpose on a lower, separate layer than the actual "identity layer" that establishes your digital self.

I just wanted to chime in to say that I agree with this limited conception of DID Documents, which I think is close in spirit to the one Joe is arguing for. I do not agree with a richer conception that overloads them with lots of meaning and infinite extensibility. I think other resources, accessed through service endpoints, is where that belongs.

Simpler is better, at the relatively primitive communication-enabling level where DIDs belong.

dhh1128 commented 5 years ago

DIDs should never be tied to a specific person, because that can change. Yes. If you didn't get that, you need to understand that a given DID's Subject can change from one physical person to another. If that's outside the scope of what you have imagined so far, simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time, it actually doesn't refer to any specific person at this moment in November 2019. Sometime in the next decade or two, it almost certainly will. And that is completely independent of whatever might be recorded in a DID Document.

Can we get more precise? If I parse this statement very, very carefully, I don't disagree with it--but a lighter reading gives what I consider a faulty impression.

Here is what the DID spec currently says about persistence:

4.9 Persistence A DID is expected to be persistent and immutable, i.e., bound exclusively and permanently to its one and only subject. Even after a DID has been deactivated, it is intended that it never be repurposed.

Now, Joe's statement doesn't say that the DID subject can change; it says that the person associated with the DID subject can change. I agree with that. If a DID's subject is "King of England", then the DID's subject hasn't changed when the person playing the role of "King of England" changes. The subject is stable; the person associated with that subject is what changed. This is more or less how we expect organizational DIDs to work. The staff of a company evolves over time, but the DID's subject--the company--remains constant.

But this is not how DIDs for ordinary people are expected to work. For ordinary people, the people are the subject. And a DID like this can't be an identifier for Alice today, and Bob tomorrow. So when the subject of a DID is a person instead of a role, the person in question is immutable.

Agreed?

dhh1128 commented 5 years ago

Markus: But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore.

Dmitri: +1 to that. A JSON serialization is a very concrete one, not abstract.

JSON is a notation. Hence the 'N' in its name.

It is true that it can also be a serialization format--but we do not have to view it that way for the purposes of writing a spec. JSON as a notation is terser, clearer, and easier to work with in text than UML or fuzzy human language. Expressing the hierarchy and sequences of a data model with {...} and [...] makes much better sense to me than deliberately picking something clunky and less precise. As long as we say that the notation can be rendered in various serializations (including JSON-as-serialization, CBOR, etc), I think it's an optimal choice.

SmithSamuelM commented 5 years ago

@dlongley

A couple of historical complications of JSON-LD 1) namespace collisions with openschema.org. We have changed top level block names to avoid collisions with openschema.org this adds a dependency that complicates at no value to those that do not use openschema.org

2) An unused @context is an invitation to a malicious injection attack on a did method resolver.

Both of these suffer from the complication of making external unexpressed dependencies part of the DID Doc. At least in an abstract data model we can make all dependencies internal. Implementers of a optional JSON-LD encoding could expand their dependency space at their leisure without encumbering the spec for everyone else.

3) From a cryptographic perspective in order to establish and verify authoritative attestations data is needed about those attestations. What is of primary importance is can the attestation be cryptographically verified as emanating from the current set of controlling keys. Whether or not the data refers to a subject (is it the DID, or the controlling entity of the DID) or whether or not it is meta-data with respect to JSON-LD/RDF is of secondary importance. We are lead to make poor crypto choices out of a desire to achieve JSON-LD/RDF purity.

For example if a user sees two different versions of a DID Doc that are both signed with the same key pair(s), how does the user know which one to trust or which one is the most recent? There are many mechanisms for helping the user make this determination such as a sequence number, a hash in a chained set of hashes, a date time stamp, a version number etc. That information needs to be inside the signature. It needs to be unique in the document. But these sorts of questions often take a long time to answer when encumbered by JSON-LD semantics and syntax.

SmithSamuelM commented 5 years ago

@dlongley

We are using two different definitions of extensible. What I mean by extensible is the the document has a core set of defined contents and may be extended by adding additional contents. What appears to me is that the JSON-LD folks mean extensible to mean the a DID-Doc is extended by an external world model. In other words a DID-Doc is an intensive part of an extensive data model.

The latter definition is the problem. It explodes the dependency space. It makes discussion difficult. We need to discover the authoritative keys for the DID. Once we discover those we need to discover a few other things like how to access service endpoints that provide other functions or resources. But in a cryptographically verifiable way. That discovery needs to be authoritative. There are a few core things we need to know to make that discovery authoritative. Once we have made it authoritative there are a few other things we now know of that are common that we need to discover like service endpoints and how to talk to them. We can define these in JSON and then add others as they become important over time (extend the core contents of the document to allow that) Extending the document to include a world data model is mixing the larger question of identity with the smaller questions of how to do discovery of the authoritative keys and services. I frankly am having a hard time appreciating why a DID Doc has become the source of this greater problem. It makes doing the simple tasks harder. It is mixing concerns. A DID Doc is meta-data to bootstrap authoritative verification of attestations made by the controller of a DiD. All these extended world model usages could much more appropriately be included in a verifiable credential about the controlling entity. Let’s just have a bootstrap to a service endpoint that provides tha verifiable credential. The verifiable credential then has the extensible world model avaiable to it. This is what I call paradigm greed. That is trying to apply the schema centric approach of an extended world model to the bootstrap needed to credibly verify the a document (verifiable credential) describing the intensive part of an extensive world. Ever computation task is not best described via an extensive world data model. We need clean separation of concerns to do secure cryptographic bootstrapping to a state where a verifiable credential can then provide the world model. Many of the things I see being suggested for the DID Doc could be put in a verifiable credential. Let’s do that and keep DID Doc simple.

Indeed, I propose this criteria, any information about the subject entity of the DID that could be provided via a verifiable credential obtained from a service endpoint should not be in the DID-DOC. The only things that should be in a DID-DOC are those items needed to first bootstrap the control authority needed to bootstrap secure communication to such endpoint and validation of said verifiable credentials.

Verifiable credential are wonderful things. Let’s have more of them. But not disguised as a DID Doc.

dhh1128 commented 5 years ago

DID Docs should be "extensible" in the same way and to about the same extent as HTTP headers are extensible: you can add extra stuff without breaking anything, and if the entity you're communicating with groks that extra stuff, fine. Otherwise, it has no effect. We do not need "extensibility" if it means namespacing, a complex resolution/processing model, @contexts, etc.

Those additional complexities are very reasonable when you need a true semantic graph (as with VCs)--but the power of DIDs is tied more strongly to their simplicity than to the semantic power of DID docs. If you want semantic power, use services at endpoints, not the DID doc itself.

SmithSamuelM commented 5 years ago

The more I think about the above suggestions @SamuelSmithM @dhh1128 the more I am convinced that it cuts to the root of the issue. Do not put anything in a DID Doc that can be provided by a verifiable credential at a service endpoint. Only put in the DID Doc what is essential to access the verifiable credential. With that filter we will have very little left to put in the DID Doc merely the bare essentials and these will hardly need an extensive semantic model. Because if they did then they could be provided by a verifiable credential. We just need the minimum to bootstrap.

SmithSamuelM commented 5 years ago

It might help crystalize the mental model to change from DID-Doc to DID Discovery Doc or simply DID Discovery.Data

ewelton commented 5 years ago

The problem, as I see it, is did:peer - that should not be a DID method. In the context of pairwise communications the semantic issues are vastly different than in a 1: communication about an identifier - where we desperately need JSON-LD. In pairwise communications we do not need machine processable semantics, the semantics should be determined as part of the communication protocol - but in terms of VCs and 1: DIDs, general purpose, extensible semantics are critical.

ewelton commented 5 years ago

I need to understand a use-case where two communicating parties, or a small group of parties, are communicating and need to appeal to some global semantic mechanism. I just do not see it - and with that, issues like service endpoints, fragment processing, persistent reference get in the way. If I have a DID assigned to a specific pairwise communication, or to a specific credential, the need to discover how to communicate is unclear - it is like calling someone on their phone and then asking them for their phone number. If you can call someone on their phone you do not need a zero-knowledge disclosure process for discerning and validating their phone number - you have that already.

On the other hand, when trying to discover and communicate with people, organizations, and things - when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of "fat DID documents" - and when you deal with fat DID documents you need machine processable semantics - which is what JSON-LD provides. Opting for a 2nd layer of meta-configuration about the semantic milieu adds enormous and unwarranted complexity - avoiding JSON-LD in order to re-create semantic negotiation adds tremendous complexity and inhibits adoption.

We need to separate out PIDs (Peer/Pairwise DIDs) and DIDs (public decentralized identifiers) - if a ledger is involved, it is a public DID - why else go to the trouble of anchoring it to some global oracle of authoritative state? PIDs are critical - but they are so dramatically different in their use-case domain that trying to get a one-size-fits-all DID document leads to exactly the sort of confusion we are struggling with.

I want to see DID:peer -> PIDs and I want to see pairwise DIDs removed from our lexicon - let JSON-LD rule the landscape of interoperable, multi-system, multi-platform identification. They can share some root utilities - like KERI, but these are apples and oranges.

dhh1128 commented 5 years ago

@ewelton : That's a fascinating take. Initially I hated it, but now I'm stepping back and trying to evaluate more thoughtfully.

I'm curious about the broad claim that "when trying to discover and communicate with people, organizations, and things--when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of 'fat DID documents.'" This seems doubtful to me, because we have systems like this today (only not as decentralized), and I don't see them as needing what you claim. But say more about that; maybe you can lead me along...

(Possibly we should drop off this comment thread, though, if we veer too much into a tangent from Sam's original intent for this issue...)

ewelton commented 5 years ago

I agree with this statement:

With that filter we will have very little left to put in the DID Doc merely the bare essentials and these will hardly need an extensive semantic model. Because if they did then they could be provided by a verifiable credential. We just need the minimum to bootstrap.

and @dhh1128 - I believe, strongly, in your vision of pushing computation to the edge. I remember attending your presentation in Basel, and that picture with client-server and client-blockchain stuck in my head. I think you are right.

but I also believe strongly in JSON-LD - at a recent meeting with a government I was pitching DIDs as an alternative to centralized governmental certificate authorities. One of the selling points was that "you don't have to be bound to semantics, @context gives you control"

With Sam's proposal, I lose the flexibility that, just last week, I used to try to sell DIDs to a government in lieu of a centralized authority.

What it comes down to is "why are you resolving a DID" - the reason is that you want to engage it - either to perform authentication, or to open a communication channel. That is completely reasonable in the context of people, organizations, and things who participate in a society using DIDs. Participation in a society, especially when that crosses borders requires semantic negotiation - and I think that JSON-LD is about the best offering on that front in the last few decades.

On the other hand - there is absolutely no need for that level of semantic capacity when in a "micro-society" - one of the whole points of a private communication is the benefit of a shared semantic. This is what drives "inside jokes between friends"

In fact - I've been working (since Basel) on an actual mathematical result around this - it should be possible to exceed the naive shannon information capacity of a channel through "inside jokes" - between friends you can benefit from a form of steganography, so that communication remains secure even if the raw crypto is cracked. This is because the sender/receiver have a semantically tuned system - pairwise communications should not just be about syntax, it should be semantically pairwise - and that means that JSON-LD is useless overhead.

talltree commented 5 years ago

I understand what @ewelton is saying, however I strongly disagree, both about peer DIDs needing separate treatment and about the requirement to have an extensible semantic graph model at the DID level of decentralized infrastructure.

Ironically peer DIDs prove the point that @SmithSamuelM is making: all DID-to-DID communications require bootstrapping a cryptographically secure connection—whether the connection is peer-to-peer or one-to-many. The same underlying mechanisms—persistent identifiers, public keys and service endpoints—are needed in both cases. Sam's argument is that this is all that is needed, and that adoption will be easier and security (and privacy) will be stronger if this is all that is included in the data model (in other words, follow the dictum of the simplest thing that could possibly work.)

On the other hand, when trying to discover and communicate with people, organizations, and things - when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of "fat DID documents".

While I can understand someone coming from this POV, let me make sure it is clear why Sam and I and others on this thread have been arguing the exact opposite: if what you're trying to solve is a generalized discovery problem, then you not only need tools like a semantic graph model, you also need name services, directory services, search protocols, etc. That's a whole different problem space. And there are tools and technologies that already work very well for that problem space. All those tools and technologies need to do is add DIDs to become even more useful for discovery.

If OTOH the problem you are trying to solve is the Decentralized Identifier problem space: entity-controlled persistent decentralized identification and bootstrapping of cryptographically secure communications, you neither need nor want any of those other features.

Think of it like the difference between DNS and Web searching. The former uses a highly constrained type of identifier and very simple flat record format to solve one specific problem very well at scale. The latter uses a rich set of identifier schemes (URIs) and highly extensible markup languages. The latter is where you need a semantic graph model, not the former.

talltree commented 5 years ago

One more point for everyone on this thread: allowing a JSON encoding to be defined in pure JSON without reference to a semantic graph model does not prevent those who want to use a semantic graph model from defining an encoding in JSON-LD, or N-Triples, or N-Quads, or Turtle.

Nor does it prevent the CBOR community from defining an encoding in CBOR.

ewelton commented 5 years ago

@talltree I need to apologize for not quite expressing my point well - i'd had a document open in vi for a while, but the conversation is happening at internet speed!

The issue to me is not one of 1:1 or 1:N, it is about the context of the conversation. Context and multiplicity are orthogonal - the semantic is a property of the community, not of the individual. Once upon a time there was a famous poet named Pootie Tang, and, according to his acolytes he was too cool for normal words - so, even though you never knew what he was saying, you always knew what he meant.

That's why I can say "sine your pity on the runny kine" and communicate as much as when I say "sah-dey-tah" - and while these utterances are sometimes joyful, they always seem to land me in secondary at the border crossing en route to IIW. No matter how many times I say "sah-dey-tah" it always takes an hour or more before I am released. I feel this is a "failure to communicate" - and that this FTC is semantically driven.

It is always about context - and that context defines the semantic in play.

If identity is contextual, then so is communication about an identity - and it is the communicative context that picks the semantic. The need for JSON-LD is in communication negotiation.

I don't want to denigrate this - negotiating the context of communication is key - and when unknown parties are bootstrapping communication, semantics are not well defined. If i have agreed to have N parties in a conversation, it seems that picking "we will speak the Queen's English (and use JWT)" is natural.

Ironically peer DIDs prove the point that @SmithSamuelM is making: all DID-to-DID communications require bootstrapping a cryptographically secure connection—whether the connection is peer-to-peer or one-to-many. The same underlying mechanisms—persistent identifiers, public keys and service endpoints—are needed in both cases.

I agree that 'bootstrapping' a conversation requires much of the same structure - KERI for example. On the other hand, service_endpoints and persistent identifiers - I am less convinced. If I have a throw-away pairwise communication DID that is intended for one conversation, why do I need persistence? If am opening negotiations with myself, USCIS/CBP, and a bevy of lawyers - do we need to argue about "what do you mean by 'name'"?

I definitely do not want to see a world where I have a public-DID resolution that requires two levels of semantic processing - one to determine which semantic layer is in play, and a second to determine what the document means in the context of the previously selected semantic milieu. The idea of "just get rid of semantics, and stick with syntax" is equally abbhorent to me.

I think that JSON-LD provides a nice middle ground - what does JSON-LD fail to provide that CBOR or N-Quads or jada-jada doesn't?

iherman commented 5 years ago

@dlongley said in https://github.com/w3c/did-core/issues/128#issuecomment-559199898:

My reading of this is exactly what we want and already have... but it translates to: use JSON-LD and keep the core simple for JSON-only consumers. Someone treating JSON-LD as any other JSON (unless they want to use the extensibility features) shouldn't notice any difference. This is the same approach we took with VC with success.

I think it is very important to understand this point as this may be been lost in the discussion. There are a number of specifications that have been developed with a similar philosophy; to quote two of these that I was involved with (beyond VC mentioned by @dlongley): Web Annotations or Publication Manifest, but that could also be said of the way search engines relying on schema.org operate. What it means is:

The specification defines a set of JSON (note: I say JSON and not JSON-LD!) terms with well specified meaning and a processing model;
The JSON "shape" so defined can also be checked using a JSON Schema defined alongside the specification;
However, the resulting JSON "shape" is defined in a way that, if needed a JSON document defined along those lines can also be seen and processed as linked data by way of a JSON-LD syntax. This means:
- defining a @context that ensures some sort of a "mapping" from the specific JSON shape to the linked data world;
- it is the Working Group's responsibility to define the terms in such a way that, if used, that mapping is semantically sound.

All this requires a little bit more care for the Working Group in defining the underlying JSON, but with no adverse consequence for users or implementers.

As an example, there is a specific section in the upcoming Publication Manifest document that defines how the processing of a manifest should be done by an agent (in that case, e.g., and audiobook reader) and that processing is defined without any reference to linked data, RDF, etc. At the same time, the same manifest may be used, if so required, as part of a larger linked data cloud, combining the content with vocabularies defined by very different communities out there.

This approach has always been one of the driving forces for the development of JSON-LD.

I am not taking side whether the DID document should be "pure" JSON or JSON-LD. But we should take this decision understanding what the usage of JSON-LD really means...

dhh1128 commented 5 years ago

@iherman:

The specification defines a set of JSON (note: I say JSON and not JSON-LD!) terms with well specified meaning and a processing model

I believe that the construct that you're calling a "term" here is a JSON-LD-ism, not a JSON-ism, so your note doesn't compute. And I think that is the beginning of the dissonance. To even understand our process of spec-writing, or to read the spec itself, we are requiring people to understand technical definitions that are rooted in JSON-LD. And the notion that we need to "define a processing model" as part of the spec compounds this impression; the processing model for JSON is plenty clear without elaboration in a new spec, if we are not aiming for fancy constructs that JSON-LD needs.

If, instead of defining terms and a processing model, our spec limited itself to constructs so simple and primitive that they required no explanation, and if we knew these could be mapped onto a set of terms and a processing model for those that wanted to do so, that would be different.

The tax on spec developers and (more importantly) spec readers/implementers for JSON-LD as a foundation is not zero. When I asked why our spec demanded that all values of id be fully qualified, I was told that it was because of demands from JSON-LD's processing model. There's a big debate happening about key formats--part of the dissonance relates to JSON-LD's opinion about use of id versus JWK's use of kid, and the semantic mismatches between them. I could cite other examples.

I don't think the viability of starting from JSON-LD but letting JSON aficionados ignore it is the relevant question. Clearly it can be done. The question is whether the juice is worth the squeeze. What features of JSON-LD do we actually need? I would be very interested in concrete answers to that question, rather than theoretical ones. I think that, not discussions about process or theory or precedent, ought to push this issue one way or the other.

Based on lack of good examples so far, I am suspecting that such features, if they exist, may turn out to be uncompelling or just plain wrong-headed. This is based on your own observation, restated by Joe:

If we want a DID Document to be extensible without namespace conflicts, we need JSON-LD (or its equivalent). If we want to define a concise and limited set of specific properties that define a DID Document, JSON alone is fine.

And also Sam's point:

Verifiable credential are wonderful things. Let’s have more of them. But not disguised as a DID Doc.

awoie commented 5 years ago

I'm as well in favour of having a simple JSON based DID core specification.

JSON is fully sufficient to describe the abstract data model. JSON can avoid namespace conflicts by registering new vocabulary (which might be needed for new features) in IANA. While I appreciate some of JSON-LD's extensibility features it does not solve that issue as a whole. Implementers will still have to implement the interpretation of these feature to achieve interoperability. On the other hand, it would be always possible to define additional specs that describe how to use these new features/ vocabulary. The spec authors will then be in charge of registering new vocabulary with IANA, or chose terms that are collision resistant. I agree that JSON-LD introduces an unnecessary overhead for implementers that just want to use features described in this issue.

iherman commented 5 years ago

@dhh1128

I believe that the construct that you're calling a "term" here is a JSON-LD-ism, not a JSON-ism, so your note doesn't compute.

This is not JSON-LD-ism, but Herman-ism... You are right, the official terminology on JSON is "name". However, I hear the word "key", "name", "term" all around me among JSON users; b.t.w., the JSON-LD spec does not use a different terminology either. Blame it on me.

And the notion that we need to "define a processing model" as part of the spec compounds this impression; the processing model for JSON is plenty clear without elaboration in a new spec, if we are not aiming for fancy constructs that JSON-LD needs.

Again, it may be some terminology mismatch, but I respectfully disagree. The original text in this issue, as written by @SmithSamuelM describes, in abstract terms, a process model that defines, in effect, what should happen with the, ehem, names of a JSON representation (if this is the representation we choose for the data model). I do not think that a spec "just" listing names without any specification of what those names should be used for would be o.k.

As I said, I am not taking sides whether we should use JSON-LD or simply JSON. My only goal was to help making things more clearer to everyone. And yes, actually, I do find:

Do not put anything in a DID Doc that can be provided by a verifiable credential at a service endpoint. Only put in the DID Doc what is essential to access the verifiable credential.

(from @SmithSamuelM) compelling.

pelle commented 5 years ago

I would be very happy with this change. It would make it much simpler for developers to deal with it.

It also seems to me to be much more secure to have a document for specifying public keys etc to not have external dependencies such as contexts etc.

peacekeeper commented 5 years ago

Just some additional (intended to be neutral) observations:

Switching to plain JSON will make it impossible to use LD proofs and signatures. This could be considered a good thing since it makes everything even simpler, and JWS could be used instead. But in the DID world, proofs have been used that are much more diverse than what JWS offers, e.g. a "Satoshi audit trail" or Sovrin state proofs. It has been argued in this thread that the core features of DID documents are so simple that they don't need RDF semantics or namespaces, but when we get into proofs (and verification methods), we may see a much greater need for extensibility and open world semantics in the future. Note that there is a discussion whether such metadata even belongs into the DID document or into a separate "DID resolution result" data structure (but would that then be in plain JSON too?)
We would have to define additional rules for the "id" property (or would we change it to "sub", to be compatible with JOSE?). In JSON-LD, the "id" is a built-in construct, it contains the identifier of the RDF subject that is being described. One thing that's nice about the current spec is that "id" is used on the top level for the DID subject, and it's also used for services and public keys. We would have to consider how JWK's use of "kid" would fit in.
We would have to define additional rules for fragments in DID URLs. The media type application/ld+json defines how fragments work (they are directly based on the "id" fields, see above). The media type application/json does NOT define how fragments work, so we would have to pick something. The obvious choice (but not the only one) would then be JSON Pointer. RFC6901 talks about the use of JSON Pointer as a URI fragment. This would mean that DID URLs like did:ex:123#keys-1, which we are using today, would change to something more complex.
DIDs would probably lose their potential compatibility with WebID and Solid. On one hand you could argue that this doesn't matter for the applications most people in this community are working on (OIDC, DIDComm, etc.), since those applications only require simple mechanics for discovering keys and services. But WebID and Solid should still be considered important theoretical work on aligning traditional web architecture and philosophy with digital identity, and it may be desirable for DIDs to be compatible with that architecture.
If the decision is to go with plain JSON, then perhaps we should just extend JRD instead of inventing a new "DID document" format.

ewelton commented 5 years ago

I like the specificity of the list of specific impacts provided by @peacekeeper , and I like the simple clarity of the original questions posed by @SmithSamuelM

Is JSON a sufficient encoding for the purpose of DID Docs ?

Would JSON foster greater adoption than some other encoding such as JSON-LD ?

Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

It definitely seems that JSON is up to the task, subject to some changes like those described by @peacekeeper

I also very much like the idea of a DID-Doc being as simple as possible and focused entirely on bootstrapping communication - this is required in all DID use cases. I also very much like the sentiment that we should let VCs do the work of supporting everything beyond communication bootstrapping.

This is where I see a distinction between advertised, subject-oriented, global, persistent, published, discoverable DIDs and DIDs which are used only to secure communication or to tag a device or datum. I see a difference between a DID that represents a persistent, publicly discoverable communication endpoint - like my business or myself - and a DID that is tagging a pencil.

In the former case, when we resolve a DID we immediately need to ask questions such as "how do I communicate with you" or we want to use the DID as part of the persistent address of a resource - probably via fragments. In DNS this was dealt with adding more record types and more complex sub-structures in TXT records. I see the extensibility of the DID-Doc being analogous - it provides a structured pathway for a DID-controller to publish information about the DID and to link that DID into the global data ecology. Moreover, that ability to publish is directly tied to the DID itself - allowing the DID-Doc to be a single, authoritative, source of truth expressing "this is me" - or, more correctly, expressing the mechanics of learning about me.

The alternative model - a simplified DID-Doc - forces a multi-step process - I can't use a DID to point to a resource - i am required to use a far heavier process that involves new communication protocols and a suite of toolkits. The simplification of the DID-Document results in substantially higher complexity. If you are focused on pairwise or small-group communication, then you do not need anything beyond the simplest DID-doc - but if you are advertising a persistent DID that you expect to share generally - e.g. a DNS-like model - then you need more. This is why I think that there is a rough difference along the lines of did:peer and did: - private, pairwise, anti-correlable DIDs are subject to a different set of pressures than public, correlation-positive DIDs - and public, correlation-positive DIDs seem to me to beg for a systematic semantic extensibility.

This is why I don't quite understand the argument that JSON simplifies anything - and, in particular I am not clear about this:

This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome.

What I keep looking for is a concrete example of how JSON-LD gets in the way? What is an example of the simplification that occurs? If I am going to throw away the simplicity of an authoritative and expressive, semantically extensible, public, correlation-positive DID-Document - if I am going to give up my ability to make clear, discoverable, public statements and give up the ability to clearly and unambiguously point to persistent resources I will expect some very substantial and very compelling improvements.

@awoie has suggested a centralized, authoritative registry define and restrict how I can use my DID-Doc. @pelle says

It also seems to me to be much more secure to have a document for specifying public keys etc to not have external dependencies such as contexts etc.

The solution of using JSON simply replaces a formal, explicit, decentralized model (JSON-LD) with a centralized, implicit, and restrictive model. There is no requirement that you actively look up the @context URIs against a network. The external semantic dependency always exists, it is a question of where and how this is done - is it extensible and decentralized, or is it fixed and restrictive.

Before we introduce a hard-coded, dictatorial, centralized, restrictive semantic and remove our ability to use DIDs as persistent resource bases I think it is worthwhile to really see some of this simplification in action. If possible, can someone who feels JSON simplifies provide a short list of specific examples, like @peacekeeper 's list - showing a very concrete example of the processing improvement?

To be clear - I am sympathetic to the sensibility of simplifying the DID-doc to only express a DPKI registration - however, I think we lose a tremendous amount of value in the context of published, correlation-positive DIDs. We owe it to ourselves to be very clear about the cost/benefit tradeoff before we jetison those abilities.

ewelton commented 5 years ago

I would also like to respond to @dhh1128 - i think this is a very good question

I don't think the viability of starting from JSON-LD but letting JSON aficionados ignore it is the relevant question. Clearly it can be done. The question is whether the juice is worth the squeeze. What features of JSON-LD do we actually need? I would be very interested in concrete answers to that question, rather than theoretical ones. I think that, not discussions about process or theory or precedent, ought to push this issue one way or the other.

What I want is the ability to say "@context:[the-did-core-bootstrapping-context-uri]" for a DID that has the minimal data required to bootstrap communication. In this case, I can just ignore the JSON-LD and deal with the JSON - hardcoding the semantic. This is particularly useful in pairwise handling - it is particularly useful for anonymity, for transient DIDs, for did:peer dids, and a whole range of use cases. I expect this to be the mainstay.

Alternatively, I might be exploring multi-DID relationships in complex guardianship or custodial relationships. In that case, I can add a context to a specific DID document and include the relevant fields. This can be done by just the subset of DIDs for which it makes sense - and there is no need to try to solve all the possible guardianship or IoT situations and get them into the centrally registered specification.

Perhaps there is other information - versioning information, timestamping, or whatnot that is helpful for a particular DID method - in those cases I can simply add another URI to the @context array. I may want to associate some additional information about a novel form of key-recovery.

That to me is the key value proposition of JSON-LD - the ability to formally declare additional @context elements on a per-DID basis.

It is the case that I can also do all of this with VCs - but then I can't simply use the DID to "point" to a resource - I must always provide a pair - the DID and a 2nd URI for the VC that contains the linked information. And for public statements I want to make about myself, I can always issue myself a VC and find some place to host that for general availability.

There is a certain clarity and elegance to that - but I don't necessarily see it as a simplification - it is, rather, a bit of added complexity.

Perhaps the issue is whether or not DIDs should be thought of as resource anchors - if they are not resource anchors, and are solely for DPKI registration then why not move service_endpoints and perhaps authentication out and make those VCs as well? That does seem a bit extreme - but it is also oddly pure.

Again - that's why I think it comes down kinda along the public vs. private and the peer/published distinction. If I am registering a DID on a global ledger, then I am making a public announcement of a resource base - but if I am making a private DID for secure, correlation resistant, p2p communication then the sense of a published resource base is gone. I bootstrap communications and exchange VCs, and we never, ever need to touch a ledger - in that case we use the minimal set of DID-Doc fields. However, if I am advertising information on a ledger in a sort of generalized DNS style then i want to use @context to formally declare what information is being published - and I want the author to be in control of that.

I think the use of @context gives us the flexibility to deal with those two differing roles of a DID at a relatively low cost - and the did-core-context can be hardcoded and DID documents can be processed as JSON, completely ignoring the JSON-LD if that helps. I just see the @context as a critical pressure-release valve - it is a minimal intrusion which opens the door to a universe of possibilities - it lets us use DIDs as communication anchors and as resource namespaces.

That could be the core of the problem - because I definitely agree w/ @talltree here

While I can understand someone coming from this POV, let me make sure it is clear why Sam and I and others on this thread have been arguing the exact opposite: if what you're trying to solve is a generalized discovery problem, then you not only need tools like a semantic graph model, you also need name services, directory services, search protocols, etc. That's a whole different problem space. And there are tools and technologies that already work very well for that problem space. All those tools and technologies need to do is add DIDs to become even more useful for discovery.

If a DID is only an anchor supporting DPKI registration then I think there is definite value in getting rid of JSON-LD and going with a single, centralized, authoritative specification. There is definite value in that - but there is also then very little significance to did methods - because it does not matter where the DID-document comes from. did:anything is basically the same everywhere - it is the DID-document and not the DID itself which matters.

This impacts the rubric conversation - because did:facebook is just as valid as did:sov - the method simply does not matter - as long as the identifiers are self-certifying, they can not be "taken away" - they can only be unpublished from directory services. Decentralization does not matter - only the integrity of the DID document.

That is why, to me, if we are talking about decentralized registries of information, with multiple methods - then we expect variation around the DID-document itself. That variation is what gives sense to different did methods. The entire topic of Resolution is based on the concept of a public registry. JSON-LD gives us a means of coping with the variation in DID-Doc around multiple registries and purposes.

If we are not talking about the registry issue, and only talking about DPKI, then a self-certifying document and associated processing toolkits are enough - and JSON-LD is clearly too much.

awoie commented 5 years ago

I see real value in an open world data model for certain purposes, e.g., VC data, but I'm questioning if this is needed for features in the DID spec when we consider, we already have registries for DID methods, revocation status, LD Signature Suites and potentially more upcoming. These registries are authoritative to implementers in the same way as IANA registries. And that is not necessarily a bad thing because otherwise we won't achieve interoperability.

If however JSON-LD is needed to include certain parts of the DID community, then I won't be opposed to it. However, I also believe if we cannot express the data model in JSON, then we would have failed. If it is good enough if resolver implementations or the application logic can decide whether to support or require a certain data model (de)serialisation format, then we should consider this. I want to second what @jricher said here https://github.com/w3c/did-core/issues/103#issuecomment-559190860.

ewelton commented 5 years ago

TBH, I am not sure what the value is in making a DID-document so dramatically multi-format - that seems like additional overhead of dubious value - especially if the data is a fixed set of fields. You'd have to link in dozens of additional libraries to any system just to handle all the possible formats you might get back - but that is only if you want to allow did:* to work everywhere. However, it is more likely that different methods will gravitate to different environments and use-cases and languages.

Being able to represent a DID-Doc in format A, B, and C is different from saying DID-processing software and methods must be able to handle format A, B, and C interchangeably. The fact that some industry uses XML and another uses JSON does not tell me that all DID-processing MUST support XML and JSON formats interchangeably. It seems like a bigger ask to force a lightweight TypeScript application to include the full XML processing stack just in case I got a DID document in XML.

In fact, in #103 - dhuseby references a rant from an implementer's perspective where he says that JSON-LD is about how and not what should be in the document. It is that 'what' that we want to be extensible - and JSON-LD provides a mechanism for that, although it is apparently painful - and perhaps it is because JSON-LD is too expansive for the little bit of assistance we need in extending "what" is in a document?

For example: consider the descriptions of adding certificates in #69 and #103 - it seems like this might be a prime candidate for a method-specific variation and/or a DID specific variation - e.g. DID-Doc A says "I have standard core material and support cert-type-A and cert-type-B" while DID-Doc B says "I have only standard core material"

This kind of thing could be handled w/o JSON-LD - but the question remains - how do I, the controller of a DID document, extend the DID-document if it makes sense for my use case? I want to somehow, systematically, flag "I have a set of additional attributes with the following meaning and metadata" and I want a way to say "does this DID-doc support X, Y, or Z"

Again - if we restrict the role of DIDs to be DPKI registration alone and block their use as the root of a resource domain then we have a very tight, encoding agnostic, model. If we want fluid extensibility of DID-doc feature sets what alternatives do we have to JSON-LD? Would simply replacing "@context" with "@features" and be encoding agnostic solve it?

darrellodonnell commented 5 years ago

I just wanted to chime in to say that I agree with this limited conception of DID Documents, which I think is close in spirit to the one Joe is arguing for. I do not agree with a richer conception that overloads them with lots of meaning and infinite extensibility. I think other resources, accessed through service endpoints, is where that belongs.

Simpler is better, at the relatively primitive communication-enabling level where DIDs belong.

I believe our goal is to foster broad adoption. Any complexity or extensibility that can turn a DID-Doc into some beast that was never intended is not a great idea.

As an example I point to a different domain. OASIS has a standard called Common Alerting Protocol. It is used for Amber alerts, weather warnings, and many more emergency and non-emergency alerts. It allows for an AREA on the planet to be specified - as a circle or polygon.

In the creation of a vessel tracking system a developer decided that since vessels are best represented by points, a circle with zero radius would suffice. Large volumes of vessel tracks were created. That's fine when they stood alone in a system.

But then someone tried to share them "using a standard" (that had been subverted) and caused unknown levels of pain as systems tried to understand the hacked alerts. Two big problems - they weren't alerts and they weren't really using an Area.

Now - this "hack" caused some thinking in the community that a zero-radius circle (i.e. a point) was actually valuable so the community adjusted over a long period of time. However, the confusion created caused system crashes and the reality was that OASIS CAP is not for tracking vessels... There were other standards needed for that type of data.

My point here is that bounding things to keep them simple helps interoperability. Allowing or expecting infinite extensibility may make interoperability far harder than it needs to be.

msporny commented 5 years ago

This issue is talking about too many things, at too abstract of a level of discussion. There are also a LOT of misconceptions in the "Let's use JSON as an abstract data model" proposal and I can't see how it actually translates into a working specification given the ecosystem that has sprung up around Verifiable Credentials.

I'll answer the questions raised by the issue, but am concerned that the answers won't really get us toward a concrete set of text that we can discuss.

Is JSON a sufficient encoding for the purpose of DID Docs ?

No, it isn't, because JSON only deals with local information, whereas DID Docs (and Verifiable Credentials) deal with global information (aka global semantics). JSON-LD was invented to address global semantics while enabling the folks working with JSON tooling to continue to use that tooling.

Going to JSON would be a step backwards that would see us reinventing parts of JSON-LD that took years to come to consensus and standardization on. We'd have to do that work again in this group with a very questionable return on investment on the table.

Would JSON foster greater adoption than some other encoding such as JSON-LD ?

This question is impossible to answer with any amount of certainty, which will lead to us debating using anecdotal evidence. The website design community asked the same question years ago wrt schema.org's use of JSON-LD and publishing machine-readable information... and today 27.5% of websites use JSON-LD.

Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

For what feature set? Again, the question is so vague that it could be interpreted in a variety of different ways by different people given how the question is posed. It's probably also a very bad idea to make this a popularity contest (by asking random web developers what they'd like -- might as well ask them which cryptographic algorithms they'd like to use). There are a set of features that are important for the spec to have due to the requirements, the current spec meets those requirements, moving to a less capable data model and syntax is going to force the group to re-open work that has already been done (years ago).

This issue needs a PR so we can talk about things concretely... let's not get into a philosophical debate, let's see a concrete proposal -- that'll be easier to analyze than what this issue is turning into.

SmithSamuelM commented 5 years ago

Refocus Disscussion

This discussion has been highly informative and I very much appreciate the detailed thoughtful comments. They have been illuminating and insightful. In this comment, however, I hope to refocus the discussion to the core questions posed by this proposal and re-frame the questions in light of the discussion.

Original Questions

As stated above the original questions are as follows:

1) Is JSON a sufficient encoding for the purpose of DID Docs ?

2) Would JSON foster greater adoption than some other encoding such as JSON-LD ?

3) Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ?

Stated Assumptions

These questions were based on some stated assumptions in the introduction leading up to the questions. These may be summarized as follows:

A) That better fostering universal adoption was a goal of this community. This means not just giving lip service to the idea of universal adoption but actually designing and expressing the DID-Doc spec in a way that conveniently facilitates broader adoption.

B) Universal adoption requires simplicity in the data model and the corresponding baseline encoding. This maximizes the adoptabilty of the baseline encoding and also maximizes the ease of translating to other encoding and hence the adoptability potential of other encodings. This combination best fosters more universal adoption.

Given that the community agrees to A) and B) then the next stated assumption may be summarized as:

C) The best approach to fostering universal adoption is to represent the DID-Doc specification as a simple abstract data model that is directly expressible in a baseline encoding that may be conveniently translated to other encodings.

That means an encoding other than JSON-LD or naive JSON with JSON-LD syntactic artifacts as the baseline encoding. As proposed this means clean JSON. This makes CBOR trivial, and makes other encoding like PDF easier.

Given the community agrees with A) B) and C) then questions 1) 2) and 3) become relevant.

What I see above are many arguments that JSON-LD/RDF in an open world as either superior or necessary as the encoding for a DID-DOC

Every argument that says JSON-LD is necessary is an argument against one or all of A) B) and C). Any argument for the unique capabilities of JSON-LD is an argument against A) B) and C)

Issues to Argue

So lets all be fair and argue the issues.

If one thinks that JSON-LD is necessary, then state that and accept the ensuing consequence that encoding DID-Docs in other encodings will be more difficult if not problematic and will likely become more difficult over time as more and more stuff gets put into the DID-Doc spec. As the DID Spec itself states, the open world data model on a semantic graph is more complex. So lets not pretend it isn't.

On the other hand if JSON-LD is not necessary then you accept A) B) and C). So begin your argument there. If you accept A) B) and C) but not 1) 2) 3) then argue that. If one doesn't want JSON as the language for the abstact data model then argue that point and stop arguing for JSON-LD. Argue instead for the best language to represent the abstract data model. UML is a good alternative candidate. I would love to have a discussion on the relative merits of UML vs JSON not endlessly argue why one wants to encode in JSON-LD.

Nowhere is it proposed that one can't use JSON-LD as an optional encoding. The question is not whether or not JSON-LD is good as an encoding for those that want to use it but whether or not universal adoption is critically important and if so whether or not some other encoding would better serve that purpose as the baseline spec encoding.

Focusing on the questions

Because I accept A) B) and C) then I am led to find criteria for deciding 1) 2) and 3).

That means understanding the essential purposes of a DID-Doc. Nothing more. Nothing less.

There is a vast difference between serving the essential purposes and merely serving useful purposes. Arguments for useful purposes are only material if there is no other way to serve those useful purposes.

An optional encoding in JSON-LD will allow all the useful purposes of DID-Doc that JSON-LD provides without encumbering the essential purposes that every encoding must provide.

So arguments for features of JSON-LD are immaterial if they are not essential to every encoding.

The reason for the detailed exposition of purpose of the DID-Doc in the Appendix above was to frame the discussion of what are the essential purposes of a DID-Doc so that we could agree on those as a precursor to deciding if JSON or some other encoding best provides the necessary and sufficient functions of the baseline encoding.

Based on the discussion above it seems that expanding the discussion of essential purposes will be helpful.

DID Subject

One frequent issue in DID WG discussions is what is the DID Subject.

The DID spec https://www.w3.org/TR/did-core/ Defines it thusly.

5.2 DID Subject

The DID subject is denoted with the id property. This is the entity that the DID document is about, i.e., it is the entity identified by the DID and described by the DID document.

DID documents MUST include the id property.

id
The value of id MUST be a single valid DID.

This is much too ambiguous of a definition and is the source of confusion even in this thread. As each imbues their definition of subject for their usage of the DID-Doc with different meanings. Indeed one discussion was about whether the DID-Doc itself could or should be the subject of the DID-Doc.

The dictionary definition of ambiguity is as follows:

open to or having several possible meanings or interpretations; equivocal:
an expression exhibiting constructional homonymity; having two or more structural descriptions.

So what do I mean by my contention that this definition is too ambiguous. Its too ambiguous if there is a tighter definition that better serves the essential purpose of a DID-Doc.

The only entity that is essential with respect to the primary essential purpose of a DID-Doc is the controller of the private key or keys that are authoritative for the DID.

This entity is the only one we need to care about when determining the authoritative set of keys.

Whether or not that needs or is better expressed as the subject id of a DID-Doc is a good question. It certainly can be implied. There is always an implied authoritative controller entity of the DID-Doc. This entity is authoritative in a cryptographic sense by virtue of its control over the associated key pair or pairs.

Only the controller can make verifiable authoritative nonrepudiable statements about the DID or resources affiliated with the DID. These include statements about the controller itself.

Not recognizing this fact is where the discussion goes off the tracks. A DID Controller may make authoritative statements about anything it chooses, including other entities such as the DID-Doc itself and any resource affiliated with the DID. The DID Controller may also choose to make statements about itself. But most importantly the DID Controller may choose to make authoritative statements about the key pair or pairs by which those authoritative statement may be verified in the future.

What matters foremost is validating if a given statement is authoritative w.r.t. the DID Controller.

A DID-Doc's primary purpose is to bootstrap the user of the DID-Doc to the point where that validation may be made.

Any other purposes are secondary and may be provided in other ways.

A useful and potentially essential other purpose is to bootstrap the discovery of affiliated resources such as those that may be obtained from service endpoints. This useful other purpose could be so practically useful as to be viewed as essential to the meaningful use of a DID.

Given these two purposes in order of importance are provided by a DID-Doc then any other purpose may be met by resources external to the DID-Doc in the form of verifiably authoritative statements made by the DID Controller.

This conclusion does not preclude putting additional resources in the DID-Doc as useful things but they are not essential things and are therefore optional things.

If the unique value of JSON-LD lies in its ability to do the useful things but not unique to doing the essential things then JSON-LD due to its complexity is not a good candidate for the essential or baseline encoding. It is best used as an optional encoding.

DID-Doc as an Ersatz Root VC for the DID Subject

It seems clear that the community is divided over this discussion. What appears to me as the underlying but unstated source of that division is the mental model the two sides have for a DID-Doc.

One side has the mental model that the DID-Doc provides intensive semantic knowledge about the DID subject in an extensive world model. This looks like an ersatz root verifiable credential (VC) issued about the subject. The arguments for this mental model fit the arguments that one would make for a self-issued root VC. If it walks and talks like a VC it is is a VC.
The mental model for the other side is that a DID-Doc provides a cryptographically verifiable bootstrap that enables validation of authoritative statements made by the DID Controller. A subset of these authoritative statements include VCs or ersatz VMS issued by the DID Controller. This subset includes self-issued VCs about the DID-Controller itself.

The former mental model must still establish the purpose of the latter mental model but does not recognize it as such. It co-mingles both in the DID-Doc. It does not separate the concerns. This complicates things. The former is dependent on the latter. Therefore they are not equivalent models. This is the cause of the division.

A combination of these two mental models that separately embeds and encapsulates the associated two purposes may be provided in one DID-Doc. The first is to provide the essential bootstrap data needed to validate authoritative statements by the DID controller. Once that is provided the DID-Doc may optionally also provide ersatz VCs issued by the DID-Controller. These could include self issued VCs where the DID Controller is the subject.

Given this clean separation of concerns and associated encapsulation of data by purpose (essential and optional), other encodings would not need to be encumbered with the optional embedded ersatz VCs but could merely provide the bootstrap data in the DID-Doc and provide the ersatz VCs in another way such as via a service endpoint.

Thought Experiment

To clarify my own thinking I often conduct thought experiments about meaningful edge cases.

In my thought experiment I wondered what would be a maximally adoptable encoding for Did-Docs. The answer is a QR code. In order to fit a DID-Doc in a QR code I would have to limit the data to the bare essentials. Could I do that and still enable someone to practically use the DID. The answer is yes as long as additional information that may be needed to use the DID could be provided via a service endpoint and this other data could include a self-issued root verifiable credential (VC). So I would only need enough information to validate the inception event for that originating key-pair for the DID and a service endpoint to get everything else.

Now I have a path to truly universal adoption. This path does not preclude someone from using an encoding such as JSON-LD that embeds the ersatz root VC in the DID-Doc. They may go to town with that and benefit form all the goodness that JSON-LD brings. But all other encodings have a simple direct path to full functionality.

So maybe the abstract data model baseline encoding should be a QR Code =) (just joking)

ewelton commented 5 years ago

@SmithSamuelM I really think that was tremendously clear - thank you.

I guess, for me, I don't strongly agree with B - we pay a price for B and I think it is a 'might help' sort of priority. In terms of the two mental models I would say that I am somewhere in the middle - but I might use slightly different adjectives and I would not say that one is about the DID-subject and the other is about the DID-controller - which is critical to pay attention to.

To me the difference is like a doorknob and lock. Separation of concerns would say "always use a deadbolt and separate doorknob" - but for a lot of doors in my life, I like the simple fused lock/doorknob combination. When I come home from the store I almost always have bags in my hands, and so I value the ability to turn the key and the knob at once - it really makes things much easier and simpler. An example of "doorknob extension" is whether or not you wanted the door to automatically lock, or if locking required explicit action - there really are a lot of options with a simple door. I may have different keys for a simple lock - but the subject is clear, it is the portal itself, and it is under the control of people with keys.

It could be that a good compromise is dealing with this in the context of resolution. In other words, a DID-Doc with fixed, non-extensible semantics that are easily mapped through any format which is peered, inherently, with a VC and stored in the same medium as the DID-Doc using the same identifier.

Note - this is not a service_endpoint model. To me the "special VC" has two additional properties that separate it from other VCs - it is intrinsic to the method and medium in which the DID-Doc itself is stored - it is the locus for the DID's communication with the method itself and it has no sense other than about the DID/DID-Doc pair (including the distinction between the controller(s) and subject). Those properties are hard to get if you use the service_endpoint approach - and, most importantly, the relevance of these properties is sensitive to the public/private role of the DID in the ecosystem.

Also, just thinking in terms of performance engineering - consider the evolution of HTTP/3 - where minimizing the number of discrete connections and round trips is a primary engineering concern - and what we have with HTTP/3 is a structured fusion of concerns. In terms of scanning blockchains and calculating the docs from transaction chains, or establishing the trust relationships through the resolver chain - that overhead is only done once - I effectively get to ask the resolver for either a "core report" or a "full report about the DID" - i either get (DID-Doc,null) or (DID-Doc,Method-Bound-VC) - and I can do likewise with updates and revocations (which impact both elements of the pair in - ideally - an atomic operation). This is another area where there are different pressures for public/private DIDs - largely because there is only one method for pairwise DIDs that I know of, meaning that if you are focused on peer dids you will not feel the tension in the same degree.

My apologies if I've driven this conversation off track - but I think @SmithSamuelM really captured the friction well - much better than I did. I do think this is all a coherent thread though - it is the cost/benefit analysis of B relative to (x) distinction between DID-subject/DID-controller and (y) the different mental models behind the doc.

darrellodonnell commented 5 years ago

Similar to Eric I push back on B. The base assumption that N encodings are needed doesn't make a ton of sense to me, especially if it is simple.

I have two points related to this that both relate to the evolution of a specification/standard.

ADVICE POINT 1. Start Simple but Plan for Evolution.

Specifications get started with the best of intentions. Technology areas with broad goals are attacked, ambitions are stated and rich depth and breadth are discovered.

Then things start to get a little wonky. Edge cases and corner cases start to be built before they are truly understood.

Successful specifications start simple and allow for evolution. They do the bare minimum - often to the detriment of performance early on. Consider the earliest versions of http, smtp, pop, ftp, amqp, etc. They generally started out crudely but allowed the job to be done. The community used them, learned, and extended (hacked) them where they didn't quite work. Those learnings and hacks were examined, and where appropriate, built into subsequent versions of the specifications. Often the specifications evolved quite quickly.

I suggest we do our best to keep things as simple as possible and allow optional extensions. After we see more adoption we will likely learn where the broad community needs something (build that into the next version) or where a specific need is unique to a community (leave that in another specification or in the optional extension).

Supporting multiple encodings doesn't fit the simple category. I think that the JSON-LD portions could easily be supported in an extension area. Over time the parts that make the cut can be built into subsequent versions of the specification.

ADVICE POINT 2. Multiple encodings slow things down - and may make future evolution impossible.

I'll throw in OASIS CAP (again) as a warning. CAP 1.0 was dead simple and has an XML encoding. CAP 1.1 learned from 1.0 and the "1.0 hacks" that were used in the wild to compensate for flaws and things that were missing. At this point, the ITU was brought in as an additional SDO that would add gravitas to the standard. That worked from an adoption perspective but required an additional encoding (ASN.1). This meant we had 2 encodings: XML and ASN.1.

Later, CAP 1.2 was created but hasn't changed much - party due to the technical limitations of ASN.1 and lack of political will to evolve. This has limited the movement of CAP and, in my opinion, stunted its possible utility. Efforts to move to a v2.0 have been squashed, partly due to the multi-encoding.

This pattern has repeated under numerous OASIS standards, especially with the downplay of XML/XML Schema in favour of JSON renderings. This has already become a problem for several areas - especially as JSON Schema limitations have been found (and thus it has begun to look like a shadow/mirror of XML Schema).

Compatibility becomes a problem with multiple encodings as well. Imagine "equivalent" JSON and JSON-LD representations of exactly the same data. To answer the question "are they really equivalent?" is actually quite hard. Once you get into the (ahem) semantics of the two representations at some point we end up with data that are explicit in one area, and implicit in another. How do you definitively say that the two are functionally equivalent? The only way that I have seen this work - and it was extremely expensive - was to create testing suites that could be run by third-party labs to accredit/certify results. Not ideal at all.

I believe that a dead simple JSON specification that can be cast into whatever formats are needed in a specific domain will allow the broadest use.

Well, that's my Sunday afternoon rant/thinking about this. I believe in keeping things simple until complexity is truly warranted.

Have an awesome remainder of the weekend folks!

cheers,

Darrell

Darrell O'Donnell, P.Eng.

darrell.odonnell@continuumloop.com

On Sat, Nov 30, 2019 at 2:50 AM ewelton notifications@github.com wrote:

@SmithSamuelM https://github.com/SmithSamuelM I really think that was tremendously clear - thank you.

I guess, for me, I don't strongly agree with B - we pay a price for B and I think it is a 'might help' sort of priority. In terms of the two mental models I would say that I am somewhere in the middle - but I might use slightly different adjectives and I would not say that one is about the DID-subject and the other is about the DID-controller - which is critical to pay attention to.

To me the difference is like a doorknob and lock. Separation of concerns would say "always use a deadbolt and separate doorknob" - but for a lot of doors in my life, I like the simple fused lock/doorknob combination. When I come home from the store I almost always have bags in my hands, and so I value the ability to turn the key and the knob at once - it really makes things much easier and simpler. An example of "doorknob extension" is whether or not you wanted the door to automatically lock, or if locking required explicit action - there really are a lot of options with a simple door. I may have different keys for a simple lock - but the subject is clear, it is the portal itself, and it is under the control of people with keys.

It could be that a good compromise is dealing with this in the context of resolution. In other words, a DID-Doc with fixed, non-extensible semantics that are easily mapped through any format which is peered, inherently, with a VC and stored in the same medium as the DID-Doc using the same identifier.

Note - this is not a service_endpoint model. To me the "special VC" has two additional properties that separate it from other VCs - it is intrinsic to the method and medium in which the DID-Doc itself is stored - it is the locus for the DID's communication with the method itself and it has no sense other than about the DID/DID-Doc pair (including the distinction between the controller(s) and subject). Those properties are hard to get if you use the service_endpoint approach - and, most importantly, the relevance of these properties is sensitive to the public/private role of the DID in the ecosystem.

Also, just thinking in terms of performance engineering - consider the evolution of HTTP/3 - where minimizing the number of discrete connections and round trips is a primary engineering concern - and what we have with HTTP/3 is a structured fusion of concerns. In terms of scanning blockchains and calculating the docs from transaction chains, or establishing the trust relationships through the resolver chain - that overhead is only done once - I effectively get to ask the resolver for either a "core report" or a "full report about the DID" - i either get (DID-Doc,null) or (DID-Doc,Method-Bound-VC) - and I can do likewise with updates and revocations (which impact both elements of the pair in - ideally - an atomic operation). This is another area where there are different pressures for public/private DIDs - largely because there is only one method for pairwise DIDs that I know of, meaning that if you are focused on peer dids you will not feel the tension in the same degree.

My apologies if I've driven this conversation off track - but I think @SmithSamuelM https://github.com/SmithSamuelM really captured the friction well - much better than I did. I do think this is all a coherent thread though - it is the cost/benefit analysis of B relative to (x) distinction between DID-subject/DID-controller and (y) the different mental models behind the doc.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/did-core/issues/128?email_source=notifications&email_token=AAFHWB67WH6I4BLTXUO67ZDQWILM5A5CNFSM4JR6JDR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFP3YQY#issuecomment-559922243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFHWBYQEYTQ2ZPXMTAZILTQWILM5ANCNFSM4JR6JDRQ .

selfissued commented 4 years ago

I totally agree with @darrellodonnell’s statement “I believe that a dead simple JSON specification that can be cast into whatever formats are needed in a specific domain will allow the broadest use.” He hits the nail on the head. Let’s quickly get consensus to build this and get to work on it!

Next, @dhh1128’s statement on extensibility is exactly right: “DID Docs should be "extensible" in the same way and to about the same extent as HTTP headers are extensible: you can add extra stuff without breaking anything, and if the entity you're communicating with groks that extra stuff, fine. Otherwise, it has no effect.” On the other hand, the arguments I’m hearing from others that begin with “JSON isn’t extensible”, appear to be starting from a false premise, which doesn’t help achieve consensus.

I suggest that we adopt tried and true extensibility principles by including language like this in the specification: “All JSON members not defined by this specification MUST be ignored when not understood.” Allowing new fields to be added without breaking existing implementations enables the JSON to be extended over time. A way to ensure that this extensibility is actually implemented is to add undefined fields to the JSON in the conformance test suite (like "Don1tRejectThis!":true) and test that implementations ignore inputs they don’t understand. Let’s please also do that.

OR13 commented 4 years ago

Thanks for all the thoughts on this thread.

If its possible to provide a clear spec with just JSON, I think we have (had?) an obligation to do that before we add(ed?) a layer on with JSON-LD.

Its taken me a while to be semi-proficient with JSON-LD, almost all of the pain I experienced was the result of using tools that comprehended JSON-LD with data from people who didn't.... (myself included).

https://w3c.github.io/did-core/#contexts https://w3c.github.io/vc-data-model/#contexts

DID documents MUST include the @context property.

This means that with high probability, DID Documents that are constructed by people who do not understand JSON-LD will not work with software that understands JSON-LD...

When someone who does understand JSON-LD tries to sign such a DID Document, they will most likely receive the following error:

The property "ethereumAddress" in the input was not defined in the context.

This is in direct contradiction of the language described above: "All JSON members not defined by this specification MUST be ignored when not understood."

Similarly removing the context property will result in another error:

Error: "@context" property needs to be an array of two or more contexts.

When people don't understand things, they ignore them or remove them... doing either of these things will result in a did document which is not valid JSON-LD, and which directly harms interop with systems that plan to handle JSON-LD...

If you are planning on not supporting JSON-LD, the context definitions, and the human readable documentation, I'd rather not find that out when I try and sign your "looks like JSON-LD" did document / vc... I'd rather you not include an @context...

There are currently no implementations of DID Methods or VCs that do not support JSON-LD (every DID Document and VC has a context per the specs). This means that every DID Method and VC that includes that context but does not contribute to the documentation is kinda making this problem worse, and just has tons of not properly documented extensions... This prevents them from adopting JSON-LD Signatures or other JSON-LD tooling...

Things that look similar but are not are very danger0u5, including an @context and not using it properly is a recipe for security issues. It should not be permitted.

We seem to have 2 options:

Keep JSON-LD and make comprehension and support of it a requirement (because not doing so creates security issues).
Remove JSON-LD as a requirement and make it super clear that certain DID Document do not support JSON-LD.

I worry that (2) is actually not achievable, and instead we will be seeing @id, @type and Ed25519VerificationKey2018 in did documents / vcs that have no idea what JSON-LD is :(

These ^ are security issues, they muddy the waters, and they weaken the standards (JSON-LD, VC & DID). They are resolved by understanding why you should not use those names... that involves understanding JSON-LD today, and with medium-high probability forever.

If we are serious about (2) someone should sit down with JOSE and build a fully documented DID Method that supports JWS/JWE, show us why its better, and how it won't just make things more confusing.

msporny commented 4 years ago

If we are serious about (2) someone should sit down with JOSE and build a fully documented DID Method that supports JWS/JWE, show us why its better, and how it won't just make things more confusing.

+1, specifically, demonstrate how you will achieve at least the following things that we are currently depending on JSON-LD for:

Global identification mechanism (using machine-readable URLs, if possible)
Linking to other data elsewhere on the Web
Instance type expression
Data type expression (dates, times, units of measure, etc.)
Global semantics / no local key conflicts in key-value pairs
Versioning support
Internationalization support
Metadata annotation (information about information)
Decentralized extensibility w/o deeply understanding how to publish a spec at IETF/W3C.

We use every one of those features in almost every DID Method.

At the very least, please suggest spec text changes in a PR so we can do a proper analysis on how this impacts implementations. I'm concerned that there is so much miscommunication in this issue that multiple parties are talking right past each other in this thread.

dhh1128 commented 4 years ago

@msporny : I am dubious about the assertion that almost every DID method uses features like internationalizaiton support, metadata annotation, or instance type expression. Did you mean that almost every DID method uses at least one item from your list, instead? Or do you mean something more subtle, like they use the features invisibly, without realizing it? I am only familiar with the impl details of 4 or 5 DID methods--but I can't think of how any of them use any of the items on your list except linking to other data. And we don't need JSON-LD to put URLs into strings...

Part of what some are claiming here is that your list of used JSON-LD features may be troubling rather than insightful; maybe DID methods shouldn't be doing some of these things, because they are confusing the rich problem domain of VCs with the simpler problem domain of DIDs. For instance, do we really need versioning in DID docs? Really? There is elaborate versioning in SOAP, but some of the most popular RESTful APIs on the planet don't version their JSON payloads at all. Instead, devs ignore fields they don't understand, and use the fields they do. This works in production, all day long, every day--in part because versioning has turned out to be an occasional, minor concern in these interfaces. If the interfaces were going to be upgraded many times, with many subtleties in play at every upgrade, maybe it would be different... So, do we expect DID Docs to need lots of versioning sophistication, or to be more like JSON payloads in popular REST APIs?

It seems to me that VC semantics are rich enough to justify the versioning mechanism; DID Docs, not so much. And the same for many other fancy features. If that's the case, then it's hard to credit the assertion, for each item in the list, that A) we desperately need this feature; or B) we will have to reinvent it if we don't get it from JSON-LD. Remember the story about the mechanical engineers who invented sensors to detect empty boxes coming off the assembly line, and the minimum wage worker who just turned on a fan and blew the empty ones off?