w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
398 stars 94 forks source link

DID Doc Encoding: Abstract Data Model in JSON #128

Closed SmithSamuelM closed 4 years ago

SmithSamuelM commented 4 years ago

DID Doc Encoding: Abstract Data Model in JSON

This is a proposal to simplify DID-Docs by defining a simple abstract data model in JSON and then permitting other encodings such as JSON-LD, CBOR, etc. This would eliminate an explicit dependency on the RDF data model.

Universal Adoptability

For universal interoperability, DIDS and DID-Docs need to follow standard representations. One goal of the DID specification is to achieve universal adoption. Broad adoption is fostered by using familiar representations or encodings for the DID and DID Doc. The DID syntax itself is derived from the widely adopted and highly familiar URI/URL identifier syntax. This takes advantage not only of familiarity but also the tooling built up around that syntax. Likewise greater adoption is fostered to the degree that the DID Doc representation or encoding uses a familiar widely adopted representation with extant tooling.

The only reason not to use a highly familiar representation is if the requirements for representation demand or greatly benefit from a less familiar representation. The appendix at the end of this document provides some detail about the main purposes of a DID Doc. This shows that a complex representation is not required and may not be beneficial.

In addition, having only a single representation or encoding, albeit highly familiar and widely adopted, may be insufficient to achieve universal adoption. It may require multiple representations or encodings.

Multiple encodings require a standard base encoding from which they may be derived. Or in other words the least common denominator from which other encodings may be derived.

One way to accomplish this is to use an abstract data model as the standard encoding and then allow for other encodings. This was proposed in the following issue: https://github.com/w3c/did-core/issues/103#issuecomment-553532359

The problem with an abstract data model is that the syntax is expressed in some abstract modeling language, typically a kind of pseudo code. Pseudo code is usually less familiar than real code. This means that even in the major case the spec is written in a language that is unfamiliar. This runs counter to fostering broader adoption. A solution to this problem is to pick a real language encoding for the abstract data model that then provides both an abstracted standard encoding that other encodings can more easily be derived from and also provides the lowest common denominator standard encoding.

Clearly given the web roots of the DID syntax itself as a derivation of URL syntax, JSON's web roots would make it the ideal candidate for an abstract data model language. Of any encoding available, JSON is the closest to a universally adopted encoding. JSON is simple but has sufficient expressive power to model the important data elements needed. It is therefore a sufficient encoding. Annotated JSON could be used to model additional data types such as an ordered mapping (in the event that they are needed). Many of the related standards popular among implementors such as the JWT standards are based on JSON. Casual conversations with many others in the community seem to suggest that a super majority of implementors would support JSON as the standard encoding for the combined abstract data model and default encoding.

Given JSON's rampant familiarity, it should not pose a barrier to implementors of other optional encodings such as JSON-LD or CBOR. Compared to pseudo-code It should be just as easy if not easier to translate JSON to another encoding.

The Elephant in the Room

The result of this proposal would be to make JSON the standard encoding for the DID Doc specification and demote JSON-LD to be an optional encoding. The current DID spec uses JSON-LD as the preferred encoding but does not prevent the use of naive JSON as an encoding. However the DID spec mandates JSON-LD elements that show up as artifacts when using JSON that a JSON implementer must handle specially. Moreover, the semantics of JSON-LD are much more restrictive than JSON. This results in a lot of time being expended unproductively in community meetings discussing the often highly arcane and non-obvious details of JSON-LD syntax and semantics. The community is largely unfamiliar with JSON-LD. It is clear that JSON is sufficient to accomplish the main purposes of the DID Doc. Although JSON-LD may provide some advantages in some cases, its extra complexity runs counter to the goal of fostering more universal adoption. This proposal does not exclude JSON-LD but would encapsulate and isolate discussion about the esoteric syntax and semantics of JSON-LD to that subset of the community that really wants JSON-LD. Each optional encoding including JSON-LD would have a companion specification to the DID spec that defines how to implement that encoding. This structure will make it easier to implement other encodings in the future because JSON is much closer to a lowest common denominator data model than JSON-LD.

The relevant questions up for decision are:

The purpose of this proposal is not to debate the general good and bad of JSON-LD and RDF. There is much good in JSON-LD for many applications. But, relevant here is that JSON-LD is not as well aligned as JSON with the goal of fostering universal adoption. More specifically the RDF model employed by JSON-LD complicates the implementation of other encodings that do not share the RDF data model and RDF semantics. JSON does not suffer from this complication. This complication has the deleterious effect of slowing adoption.

Appendix

Purpose of DID-Doc

The current DID specification includes a specification for a DID Document (DID-Doc). The main purpose of the DID-Doc is to provide information needed to use the associated DID in an authoritative way.

A distinguishing feature of a DID (Decentralized Identifier) is that the controller (entity) of the DID obtains and maintains its control authority over that DID using a decentralized root of trust. Typically this is self-derived from the entropy in a random number (expressed as collision resistance) that is then used to create a cryptographic public/private key pair. When the identifier is universally uniquely derived from this entropy then the identifier has the property of self-certifiability. Another somewhat less decentralized root of trust for an identifier is a public ledger or registry with decentralized governance.

In any event, a more-or-less decentralized root of trust only has value if other entities recognize and respect that root of trust. Hence portable interoperable decentralized identifiers must be based on an interoperable standard representation. Hence the DID standard.

In contrast, "administrative" identifiers obtain and maintain their control authority from a centralized administrative entity. This control authority is not derived from the entropy in a random number. This statement may be confusing to some because administrative identifiers often use cryptographic public/private key pairs. To explain, PKI with public/private key pairs and cryptographic digital signatures enables the conveyance of control authority via signed non-repudiable attestations. But the source of that control authority may or may not be decentralized. Thus an administrative entity may convey trust via PKI (public/private keys pairs) but does not derive its control authority therein. Whereas a decentralized entity may derive its control authority over a DID solely from the entropy in the random seed used to generate the private key in a PKI public/private key pair.

A key technology under pining DIDs is cryptographic signatures by which the control authority over the associated DID and affiliated resources may be verified by any user of the DID. In contrast an administrative identifier always has, as a last recourse, appeal to the authority of the administrative entity and to whatever means that authority is established.

Indeed, given the foregoing explanation, the most important task facing a user of a DID is to cryptographically verify control authority over the DID so that the user may then further cryptographically verify any attestations of the controller (entity) about the DID itself and/or affiliated resources. The verifications must be cryptographic because, with a decentralized root of trust, the original control authority was established cryptographically and the conveyance of that control authority may only be verified cryptographically. With DIDs it's cryptographic verification all the way down.

From this insight we can recognize that a DID-Doc should support a primary purpose and a secondary purpose as follows:

If the user cannot determine the current control authority over the DID then the information in the DID Doc cannot be authoritatively cryptographically verified. Consequently, absent verified control authority, any use of the DID Doc for any purpose whatsoever is at best problematic.

Process Model for Establishing Cryptographic Control Authority

As mentioned above a fully decentralized identifier is self-certifiable. Other partially decentralized identifiers may be created on a ledger or registry with decentralized governance. The first case is the most important from a process model point of view. The second case is less informative.

The root of trust in a self-certifying identifier is the entropy used to created a universally unique random number or seed. Sufficient entropy ensures that the random seed is unpredictable (collision resistant) to a degree that exceeds the computational capability of any potential exploiter for some significant amount of time. Currently 128 bits of entropy is considered sufficient.

That random seed is then converted to a private key for a given cryptographic digital signature scheme. Through a one-way function, that private key is used to produce a public key. The simplest form of self-certifying identifier includes that public key in the identifier itself. Often the identifier syntax enables it to become a self-certifying name-space where the public key is used as a prefix to a family of identifiers. Any attestation signed with the private key may be verified with the public key. Because of its universal collision resistance no other identifier may be associated with a verifiable attestation. This makes the identifier self-certifying.

Furthermore, instead of the public key itself the identifier may include a fingerprint of the public key. In order to preserve the cryptographic strength of the root of trust in the random seed, the fingerprint must have comparable collision resistance to the original random seed. The application of further one-way functions can be applied successively to produce successive derived fingerprints. This is similar to how hierarchically deterministic key chains are generated. To restate, a one-way function may be applied to the public key producing a derived fingerprint and then another to that fingerprint and so one. The collision resistance must be maintained across each application of a one-way function.

Instead of merely deriving a simple fingerprint, one could take the public key and use it as a public seed that when combined with some other data may be transformed with a one-way function (such as a hash) to produce yet another fingerprint. As long as the process of creation of any derived fingerprint may be ascribed universally uniquely to the originating public/private key pair, the resultant derived identifier may be uniquely associated with attestations signed with the private key and verifiable with the public key. This makes the eventually derived identifier also self-certifiable.

Rotation

The problem is that over time any public/private key pair used to sign attestations becomes weakened due to exposure via that usage. In addition, a given digital signature scheme may become weak due to a combination of increased compute power and better exploit algorithms. Thus to preserve cryptographic control of the identifier in the face of exposure, the originating public/private key may need to be rotated to a new key pair. In this case the identifier is not changed, only the public/private key pair that is authoritative for the identifier is changed. This provides continuity of the identifier under changes in control of the identifier. This poses a problem for verification because there is no longer any apparent connection between the newly authoritative public/private key pair and the identifier. That connection must be established by a rotation operation that is signed by the previously authoritative private key. The signed attestation that is the signed rotation operation transfers authoritative control from one key pair to another. Each successive rotation operation performs a transfer of control.

State Machine Model of Control Authority

To summarize, control authority over a decentralized identifier is originally established though a self-certification process that uniquely associates an identifier with a public/private key pair. Successive signed rotation operations may be then used to transfer that control authority to a sequence of public/private key pairs. The current control authority at any time may be established by starting at the originating key pair and then applying the successive rotation operations in order. Each operation is verified via its cryptographic signature.

The process and data model for this is a state machine. In a state machine there is a current state, an input event and a resultant next state determined by state transition rules. Given an initial state, a set of state transition rules, replaying a sequence of events will always result in the same terminal or current state. This is a simple unambiguous process model. The data model is also simple. It must describe the state and the input events. There is no other data needed. The state is unambiguously and completely determined by the initial state, the transition rules and events. No other context or inference is needed. A simple representation will suffice.

Once the current control authority for a DID has been established to be a given key pair (or key pairs) then any other information affiliated with that DID may be cryptographically verified via a signed attestation using the current key pair(s). The important information needed to establish the authoritative stature of any additional information such as encryption keys or service endpoints is the current authoritative signing key pair(s) for the identifier and that the version of the information in the DID Doc is sourced from the controlling entity of the current key pair(s). This means the DID Doc may benefit from an internal identifier that corresponds to the latest rotation event that establishes the current key pair(s) or some other identifier that associates the DID Doc with specific signing key pair(s). This process of first establishing authoritative key pair(s) greatly simplifies the cryptographic establishment of all the other data.

There are various mechanisms that may be employed to maintain the state and associated event sequence. These could be as simple as a set of servers with immutable logs for the events/states that also run the code for the state transition logic. A more complex setup might rely on a distributed consensus ledger to maintain the state.

The DID Doc in and of itself, however, is insufficient to fully establish the current authoritative key pair(s). Other infrastructure is required. Merely including a set of rotation events in a DID Doc only establishes control authority up to the latest included rotation event. But other rotation events may have happened since that version of the DID Doc was created. Consequently a DID Doc's main role in this respect it to help a user discover the mechanisms used to establish current control authority. This must be done with some care because in a sense the DID Doc is bootstrapping discovery of the authority by which one may trust the discovery provided in the DID Doc. Nonetheless in order to be authoritative, the other information in the DID Doc that is not part of discovering authoritative control does not need an event history but merely a version identifier linking it to the authoritative key pair(s) and an attached authoritative signature from the current authoritative key pair(s).

In other words the DID Doc is used to bootstrap discovery of the current authoritative controlling keys and then to provide authoritative versioned discovery of affiliated information.

RDF Complications

The RDF model uses triples to canonicalize a directed graph. This graph may be used to make inferences about data. This model attaches a context to a given DID Doc that must be verified as authoritative. This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome. This forces a particular potentially unnecessarily more complex-methodology on implementing versioned discovery documents or evented state machines than what might be the easiest or most convenient for the implementer.

mikelodder7 commented 4 years ago

To @dhh1128's point, If you look at all the DIDs at https://w3c-ccg.github.io/did-method-registry/, None of them use all of the bullets you state @msporny except the Veres one DID which is yours. All of the participants in this issue have presented well-researched solutions and answers while I feel yours are opinion based. If you state a historical issue from w3c that is similar let us know because we'd all be interested to read it in the archives.

SmithSamuelM commented 4 years ago

Based on @darrellodonnell and @selfissued comments. I am revising B) and C)

Old: B) Universal adoption requires multiple encodings. That means encodings other than JSON-LD or naive JSON with JSON-LD syntactic artifacts. It means clean JSON, CBOR, PDF to name a few.

New: B) Universal adoption requires simplicity in the data model and the corresponding baseline encoding. This maximizes the adoptabilty of the baseline encoding and also maximizes the ease of translating to other encoding and hence the adoptability potential of other encodings. This combination best fosters more universal adoption.

Given that the community agrees to A) and B) then the next stated assumption may be summarized as:

C) The best approach to fostering universal adoption is to represent the DID-Doc specification as a simple abstract data model that is directly expressible in a baseline encoding that may be conveniently translated to other encodings.

That means an encoding other than JSON-LD or naive JSON with JSON-LD syntactic artifacts as the baseline encoding. As proposed this means clean JSON. This makes CBOR trivial, and makes other encoding like PDF easier.

dlongley commented 4 years ago

@SmithSamuelM,

Premise "B" simply assumes away the point of contention in this thread, which is a logical fallacy.

mikelodder7 commented 4 years ago

How so @dlongley? Please elaborate.

SmithSamuelM commented 4 years ago

Information Model Agreement

Broken Mental Model

@msporny @dlongley @ewelton et al.

In my attempt to re-focus and re-frame the debate I tried to draw attention to the purposes of a DID-Doc and classify them as either essential or useful. Instead of addressing this foundational issue the responses have been largely to jump directly to lists of features and then claim these features are essential without actually addressing the core purposes of a DID-Doc.

The reason we are at this impasse is that we have a broken mental model of what a Did-Doc must accomplish. There is obvious cognitive dissonance in how the community approaches this problem. This is prima facie evidence of a broken mental model. To be more rigorous, what I have labeled in previous comments in this thread as the "mental model" is better expressed as the "Information Model" as per RFC 3444 https://tools.ietf.org/html/rfc3444 which makes a distinction between and Information Model (IM) and a Data Model (DM).

To quote: "The main purpose of an IM is to model managed objects at a conceptual level, independent of any specific implementations or protocols used to transport the data. The degree of specificity (or detail) of the abstractions defined in the IM depends on the modeling needs of its designers. In order to make the overall design as clear as possible, an IM should hide all protocol and implementation details. Another important characteristic of an IM is that it defines relationships between managed objects."

Until we agree on the information model aka purposes and functional dependencies of those purposes we will remain trapped in this conflict. Much of the confusion and unproductive discussion in the meetings arises from the inherent greater complexity of an open world mental model. Its less about the syntactical artifacts of JSON-LD and more about the informational artifacts.

Starting with a concrete data model encoding such as JSON-LD as the precursor to the Did-Doc spec is an extreme example of putting the cart before the horse.

I am reminded of the time my wife and I went to the car dealer to buy a new car. We found a model/package that had all the functional features we wanted. We then asked what colors of that model/package the dealer had on the lot. The salesman responded that we could get that model/package in any color we wanted as long as that color was blue. They only had a blue one.

Likewise it seems to me that responses to my comments agree that I may pick any data model I want as long as its JSON-LD.

Is universal adoption a goal of this community? I think it is or at least should be. Fair discussion of this goal starts at considering the true necessary essential features not wanted optional wanted useful features.

It seems that there is not broad agreement within this working group that universal adoption is of vital importance. My motivation is to create a DID spec that provides truly portable identifiers that foster self-sovereign identity and trust over IP.

In other words I want most a DID spec that acts as an adoption vector to realize greater self-sovereign identity and better trust over the internet. My concern is the some within this community want most to leverage the momentum behind self-sovereign identity and trust over IP as an adoption vector for JSON-LD.

I am not against graph data models. I strongly support graph data models when used appropriately but not universally. The original white paper I wrote on decentralized identity in early 2015 calls out that identity is a recursive composition that is best expressed as an identity graph. It is at the compositional information layer where privacy and selective disclosure play an important role. https://github.com/SmithSamuelM/Papers/blob/master/whitepapers/open-reputation-low-level-whitepaper.pdf

Proposed Information Model

With respect to RFC 3444, agreement on an abstract information model is a precursor to agreement on an abstract data model. As stated above, information models represent relationships between information especially dependencies. These dependencies include functional ordering and layering. If you get the dependencies wrong or mix up dependencies and layers you have a broken information model. It is clear to me based on this discussion that the JSON-LD proponents are largely using a broken information model. Or at the very least are using an information model that is so far incomprehensible to me.

Once we have the correct information model then and only then should we worry about the data Model. The problem I am trying to fix is what I see as a broken information model.

Historically the DID spec WG jumped to a concrete JSON-LD Data Model without first getting agreement on the information model, actually, without even discussing the Information model per se. This is why I spent time deriving the purposes of a DID Doc in the appendix of the original post in this thread. It was my way of capturing the information model so that I had a good foundation for discussing a data model. But the JSON-LD proponent's responses to my original post have largely ignored the information model outlined in the appendix above.

As a result I am going to further focus my discussion on the information model.

The correct information model is a layered relationship with a hard dependency between the layers!

Layer 1: Bootstrap from a root of trust to the authoritative signing key or keys. This is the only functionality necessary to the bootstrap layer. In no way does this layer benefit from any of the unique features of JSON-LD and the open world information model of RDF. Indeed an open-world model violates best practices for informational security. This is a fatal flaw of using RDF in the bootstrap.

Layer 2: Verifiable Attestations using the authoritative Signing Keys from Layer 1. A verifiable attestation is any information signed with the signing keys.

Given agreement on this information model we can then talk about data models. At layer 2, a subset of the class of verifiable attestations includes the class of verifiable messages. A subset of the class of verifiable messages includes verifiable documents. A subsets of the class of verifiable documents includes JSON-LD documents that use an open world model. A subset of JSON-LD documents include the class of verifiable credentials.

That means we shouldn't even discuss JSON-LD until we are way down the information representation hierarchy of data models that fit our information model.

Indeed the name DID-Doc implies an information model that is inaccurate. A document in this context is dependent on a bootstrap function and therefore before we can talk about a DID Doc spec we should talk about a bootstrap to a cryptographically verifiable source of authoritative control spec.

Then and only then should we decide if we want to represent both layers with a single encoding type and if so if we want to include both layers in a single document or to keep them separate. There are lots of ways to bundle two layers in a single message or a single document without breaking, mixing or co-mingling the layers.

Trusted Computing Implicit Identifier Information Model

A closely related and very informative spec that shares information model proposed above is the following from the Trusted Computing Group. https://trustedcomputinggroup.org

https://trustedcomputinggroup.org/wp-content/uploads/TCG-DICE-Arch-Implicit-Identity-Based-Device-Attestation-v1-rev93.pdf

The trusted computing IM may be simply summarized as follows:

Bootstrap to derived implicit identifiers Make verifiable attestations using the bootstrapped implicit identifiers.

Because the information model used in the implicit identifier spec is essentially identical to the two layered IM model proposed above, the proposed information model for DIDs stands on solid best practices security ground. My argument is that this is the most appropriate information model given we want to fix trust over IP.

They use the term implicit identity to refer to identifiers that are self-certifying. (Apparently they are not familiar with the self-certifying work in this space from the 90s so have invented a new term). They use the name Device ID (same acronym of DID but not a W3C DID). The root of trust in their Device ID is the entropy in a random number generator that operates on first power up of a device so it is private to the device. From this root identifier other identifiers called aliases may be derived. Once these self certifying identifiers have been created, the next layer in their IM stack is uses those identifiers to make verifiable attestations that will be sent over the network.

The intent of their specification is to lay the groundwork for a future where all network capable computing devices will have this bootstrap functionality. This will enable trusted computing capability without the need for a heavy weight TPM. Just the bootstrap to a derived identifier(s) based on a root implicit identifier and then make verifiable attestations thereby.

We need to use the appropriate information model and then use the most appropriate data model for that information model and then pick an appropriate encoding to express that data model.

Given a layered information model, a different data model and encoding may be used at each layer.

Trust over IP is only possible if we use good cryptographic practices in our information model. Thus a layered model. The semantic web is an optional data model for a subset of what one may choose to do in layer 2 but has no place in layer 1. Indeed it is a huge security problem to put anything like an open world model in layer 1.

JSON, CBOR are sufficient for layer 1. Security consideration make JSON-LD a bad candidate for layer 1. Putting the bootstrap in layer 1 simplifies layer 2. This layering makes it easier to support multiple layer 2 encodings.

@longley as prima facie evidence of B) This post establishes B) IE layering clarifies dependencies. Removes constraints on layer 2. Some applications may only need to do minimal work at layer two. All must do layer 1, but layer one does not benefit at all from JSON-LD. One of the editors of IETF Remote Attestation Proceedures WG told me that using JSON-LD with its open world data model for DIDs makes DIDs DOA for trusted computing. IMHO, DIDs with JSON-LD have little chance of formal adoption by any of the trusted computing standards.

dlongley commented 4 years ago

My concern is the some within this community want most to leverage the momentum behind self-sovereign identity and trust over IP as an adoption vector for JSON-LD.

Nope. You can safely discard this concern. Let's move onto figuring out how to turn this into a productive discussion.

SmithSamuelM commented 4 years ago

@dlongley

Assumption B) is my assumption. You can argue against my assumption. You don't have to accept it. It is clear that many in this community share assumption B). My post above explains from an information model point of view that indeed a simpler encoding is sufficient for layer 1.

Are you making the assertion an open world model is simpler than a closed one? I assume you agree with the contrary statement in the existing DID spec. So its not clear what your basis for arguing against the assumption of B). Its clear that from your viewpoint that JSON-LD is essential to what you want to do. But you have not made the case that it is essential for everyone else.

Layer 1 is essential. In contrast almost everything in layer 2 is optional. It may be practically useful but not essential.

peacekeeper commented 4 years ago

(@SmithSamuelM) Universal adoption requires simplicity in the data model and the corresponding baseline encoding.

While adoption is clearly a goal, I would like to challenge a bit the assumption that simplicity and universal adoption should be absolute objectives above all others. The goal should be to design a new type of identifier that can serve as a basis for a universal, next-generation decentralized identity infrastructure based on fundamental URI and web architecture. If there is a trade-off between universality and extensibility at the cost of added complexity on one hand, and pure simplicity on the other hand, then I dont think the decision is entirely clear on what is more important. While I agree that very constrained JSON documents are "simpler" and probably also more secure, I am a bit worried that some opinions on this thread may be motivated primarily by a desire to make a small set of protocols or products easy to implement, without considering the "bigger picture" of decentralized identifiers as a future building block for potentially many applications.

(@SmithSamuelM) One side has the mental model that the DID-Doc provides intensive semantic knowledge about the DID subject in an extensive world model.

(@SmithSamuelM) The mental model for the other side is that a DID-Doc provides a cryptographically verifiable bootstrap that enables validation of authoritative statements ...

(@SmithSamuelM) The correct information model is a layered relationship with a hard dependency between the layers! Layer 1: Bootstrap from a root of trust to the authoritative signing key or keys. This is the only functionality necessary to the bootstrap layer.

(@jandrieu) DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction.

I know this may be a minority opinion, but I would also challenge the assumption that the second mental model should be the only one supported by DID documents. Yes, putting too much information into a DID document is a threat to privacy and people's lives. But a dependency on an external service endpoint may also be a threat. In certain scenarios, I believe the open world model is the right model not just for VCs but also for the DID document. Especially for DIDs that identify organizations or things. And I know that @jandrieu and others will strongly disagree now, but in some cases perhaps even for personal DIDs I may want to be able to have open world statements inside the DID document rather than behind a service endpoint.

dlongley commented 4 years ago

@SmithSamuelM,

Any argument that suggests JSON-LD is "too open" and that JSON would therefore be a better choice makes no sense. JSON-LD is JSON with additional constraints. It is a subset of JSON.

These constraints are introduced to encourage extension interoperability by requiring people to extend in a particular way. Loosening these constraints such that extensions can happen any way that you please would not help resolve any concern about information in a DID Document that is not understood by a consumer. It would only harm interoperability -- and, thus, harm "universal adoption".

Anyone not interested in writing extensions need only consider the DID spec's description of the data model and its expression as JSON, nothing more. This would not change if the group were to abandon JSON-LD in favor of JSON; it would only harm extension interoperability, interoperability with the Linked Data ecosystem, and our ability to reuse existing definitions and work. I've yet to be convinced that there's some significant cost that is being paid for these advantages that makes it not worth while.

There are clearly some fundamental misunderstandings that still need to be resolved here. I do want to say that I suspect there's more material being poured into this thread than can be consumed at a reasonable rate by very busy working group members. I think we need to see concrete PR(s) or we'll just keep spinning our wheels.

SmithSamuelM commented 4 years ago

@peacekeeper

A layered model is not that same as a service endpoint model. A layered model allows a service endpoint model but does not require it. This is why I described one option for those that want an all-in-one approach to separate the information via encapsulation. Both layers may be provided in a single document as long as they are separable. This means if you want to use JASON-LD for layer 2 you can choose to also use JSON-LD for layer 1 and take the hit on security. But others may use JSON for layer 1 and 2 or use JSON for layer 1 and JSON-LD for layer 2. And may also choose to use a sevice endpoint model.

SmithSamuelM commented 4 years ago

@dlongley

An open world information model is not a subset of a closed world information model. This is the disconnect. That the concrete data model uses syntax that could be classified as an extension of a simpler syntax does not make the expanded information model simpler but makes it more complex. Semantics and syntax are two different types of complexity. This is why the information model agreement is so vital.
@dlongley @peacekeeper Layering enables the best of both worlds. It allows extensibility at layer 2 without encumbering layer 1. Please explain why the layered information model is wrong. Please explain why the bootstrap from the root of trust is not essential to everything else.

This hard functional dependency is exactly why layering is the appropriate model. We can now encapsulate and separate the functionality of the two layers. This removes the most difficult security problems from layer 2.

dlongley commented 4 years ago

@SmithSamuelM,

An open world information model is not a subset of a closed world information model. This is the disconnect.

I didn't argue this -- so I don't think it's the disconnect. I think we'd all like to get to more common understanding. This is another reason for us to try and focus on a concrete PR. We may find there's actual agreement on whatever you put forward -- or, we may find out where the disconnects really are!

That the concrete data model uses syntax that could be classified as an extension of a simpler syntax does not make the expanded information model simpler but makes it more complex.

I agree with this, I'm not sure why you think otherwise. Perhaps you think I agree with the maxim "the simplest thing possible is always the best approach". If so, I don't -- what I think is that complexity trade offs should be worth it. Adding constraints (as JSON-LD does) increases complexity for producers of extensions. However, these constraints are added because they have a net decrease in complexity for consumers of extensions and create an increase in interop and reusability. Of course, this is generally why constraints are added. Saying we won't have any constraints just means that there won't be any interop -- what we've created is "too simple" for it.

The priority of constituencies allows for spec writers and extension authors to take on more complexity such that consumers may take on less. Consumers know that extensions must all abide by those constraints -- and devs can often write applications or tools just once that are able to consume any information that uses the same approach. Often the complexities we must deal with in writing specs and constructing data models need not be understood at all by other parties -- yet they reap benefits from this approach. An alternative approach would be to force anyone who wants to write an interoperable extension to form a WG and go through the standardization process.

Anyway, it's fine to argue that you think we could simplify things to only support the use cases you're interested in. However, a WG is about compromise -- where we attempt to support everyone's use cases to the best of our ability. I think it would be much easier to decide whether certain use cases will be harmed if there's a concrete proposal (a PR) on the table rather than talking about all this in the abstract.

selfissued commented 4 years ago

I'm really impressed by @SmithSamuelM's Information Model Agreement post. It contains a lot of actionable truth that should help us focus the discussion and reach consensus on a simple, secure, privacy-preserving information model to inform the DID specification.

dlongley commented 4 years ago

@SmithSamuelM,

Layer 1: Bootstrap from a root of trust to the authoritative signing key or keys. This is the only functionality necessary to the bootstrap layer.

I think this is insufficient. I believe it is essential to also be able to discover other information about a DID subject at the root level. In fact, some DID subjects may not have any keys or may not have keys that can be used to make assertions, so you go straight from the root of trust to these other pieces of information.

This use case is missing in your analysis and I believe explains at least one of the disconnects in this issue.

ewelton commented 4 years ago

I do think this conversation has been insightful and valuable, and I definitely think @SmithSamuelM brought a lot of very clear, valuable, and excellently expressed insight. I think we all brought positive insights to the table despite the echoes of tooth-grinding. At the end of the day we are not doing this to prove points to one another, or to vie for rightness, it is about advancing the core of the internet forward in a very meaningful way.

I would like to second the efforts to move the discussion into a PR that reflects the restricted use-cases and limited semantics of the proposal. Issue #65 for example, touches on many of the same abstract discussion points - it would be very helpful to be able to evaluate issues like the discussion in #65 against a concrete vision of a semantically restricted spec - we could then clearly evaluate the impact of the proposed spec in the issues of DID-resolution, DID-metadata, the role of service endpoints, and cases where the DID is not referencing an Aries attached layer-2 agent.

I would like to see the reduced, restricted, and simplified model of DIDs in a formal PR - that PR should also clear up the ambiguity introduced by the terminology in the current reference draft.

How can we best move towards that PR?

TallTed commented 4 years ago

One of the editors of IETF Remote Attestation Procedures WG told me that using JSON-LD with its open world data model for DIDs makes DIDs DOA for trusted computing.

I am shocked and rather appalled by this statement, reportedly coming from someone who should be an expert in the areas of which they speak, but who demonstrates with this statement that they understand neither JSON-LD nor the Open World data model. (I won't dig into the logical fallacy of [unverifiably] Appealing to Authority, but that's also worth noting.)

JSON-LD is not a data model, it is a data serialization format, which is a subset of JSON. If JSON is viable for trusted computing, JSON-LD is also viable. If JSON-LD is not viable, neither is JSON. Note: I believe both are viable for such use, depending primarily on the data serialized therein.

The Open World Assumption in this context basically says that "anything that isn't explicitly stated, is unknown" (and that "anyone can say anything about anything", but says nothing about the veracity of those assertions) -- which is a much stronger base for security than the Closed World Assumption, which is basically that "anything that isn't explicitly stated, is not so".

I'm guessing that the speaker described above was referring to the common "anything not explicitly permitted is forbidden" security mantra (which is commonly placed in opposition to "anything not explicitly forbidden is permitted"), which has nothing to do with the Open World Assumption, nor with JSON-LD.

msporny commented 4 years ago

How can we best move towards that PR?

This is easy, someone does the work and puts forward a pull request on the spec in this repo. Any member of the WG (including any employee of any organization that is a member and invited experts) can raise a PR against the spec. If you are not a member of the group, you can always talk to the Chairs/Staff to see if they'll grant you Invited Expert status if you do the work to put together this PR and it looks like it's going somewhere.

talltree commented 4 years ago

I'm sure there are longer W3C issue threads than this one, but it's definitely the longest one I've ever been involved it. I was at the Hyperledger Aries Connect-a-thon in Provo all this week and each night when I tried to catch up with it I could never read it to the end ;-)

However today on my flight back to Seattle I was finally able to finish. So let me share two thoughts.

First, I believe this discussion, as long as it has been, has been valuable to the community as it has drawn in a wider set of views about the purpose and information architecture of DIDs and DID documents that have been present at the Credentials Community Group stage of the spec.

Second, RE next steps, a number of posts have asked for a "concrete PR" so we could stop arguing in the abstract. While of course someone could simply draft a PR redefining the data model in JSON and removing all dependencies and references to the JSON-LD spec, it’s not at all clear to me that’s the right next step. Rather I expect it might simply result in triggering the same discussions all over again and polarize us further.

Instead, I believe this discussion shows there are deeper issues we need to come to agreement on first. But rather than argue those in the abstract, what I would like to suggest is that we can do is break them down into a series of relatively concrete decisions we can discuss and make together. And that will result in steady progress towards consensus on the way forward.

Once we have done that, what should be in an eventual PR (or set of PRs) will likely be far more obvious and far less controversial.

My plane having landed, I am going to grab a Lyft and then start a new issue on the first of those concrete decisions I think we can make together.

OR13 commented 4 years ago

For the sake of argument I created a DID Method based on did:key, but using JOSE, that has no @context ... so its not valid according to the did spec.

https://github.com/transmute-industries/did-jose

As I note on this issue which is related: https://github.com/w3c-ccg/vc-json-schemas/issues/7

AFAIK there won't be interop without the none JSON-LD users accepting a context which they ignore. JSON-LD is stricter than json, so if you want interop with it, you need at least the @context.

Also noted on that issue is that @context is mandatory in did core and vc spec. This means that pretty much anything that has those 2 things as dependencies should take the same approach IMO.

I think this approach of requiring the @context is the crux of the issue....

without the context, its just normal json and all the features of jsonld are lost... how will we maintain interop? what is the extension model? so many different ways we could solve these issues, and each feature that we lost will need to be addressed in some fashion...

with @context its JSON-LD and likely invalid JSON-LD if the method implementers are not paying attention to document properties and definitions.

The more I think about trying to solve this by somehow getting rid of JSON-LD and replacing it with more relaxed normal JSON, the more I feel like its a maybe not a good idea... because while its easy to delete the @context, its hard to recover all the features we would need to agree on as a community to maintain interop.

sure, not everyone uses all these features, but we get them for the price of an @context, and a requirement to understand it IF you want to interop with JSON-LD.... how much will deleting it really cost?

talltree commented 4 years ago

@OR13 You are going right to the heart of what I believe is at the very center of this debate (and the reason that this thread is so long): there are two different worldviews in conflict here.

One worldview, which I'll call the "JSON-LD worldview" or more generally the "open world semantic graph worldview" believes in the power of semantic graphs and wants DID docs to have all (or most) of the features that @msporny describes here.

The other worldview, which I'll call the "plain JSON worldview" or more generally the "hierarchical deterministic worldview" feels just the opposite. They do not want to deal with semantic graph models and do not want most of those features because in their view those features represent challenges to: a) simplicity, b) security, and c) privacy, all of which make life more complex for developers and threaten to hinder adoption.

In my experience, there are no simple solutions to worldview problems. Almost by definition, both groups are starting not just from different assumptions, but more importantly, from different value models, i.e., views of what it is important and what is not important.

Again, that's why this discussion has gone so deep and so wide. Each group is trying to convince the other about its entire worldview. That's a hard, hard problem.

The reason I started issue #140 was to start to explore one potential solution which I'll describe briefly here since it's relevant to this issue as well and also to #103 (which started this whole discussion).

The essence of the idea is to stop trying to get the two groups to agree on a worldview before we can move forward Instead turn things on their head and do this:

  1. Let the JSON-LD folks proceed as quickly as they can to develop a complete JSON-LD-based data model with all the features they want to support.
  2. At the same time, in parallel, let the plain JSON folks work as quickly as they can to develop the simple JSON-based data model with the minimal features they want to support.

Then, when both groups are done (or far enough along to be ready), get the two groups together and compare/contrast/discuss where they have landed and why.

My guess is that the plain JSON folks will have developed a hierarchical deterministic model that is an easy-to-describe subset of the JSON-LD model.

If so, aligning the two will actually be pretty easy. We'd end out with two encodings—one in plain JSON that's fairly restrictive (but meets the plain JSON folks requirements), and one in JSON-LD that's much richer (and meets all the JSON-LD folks requirements). And both can work!

I'm very curious what you (and others) this of this possible path for moving us forward.

TallTed commented 4 years ago

@OR13 @SmithSamuelM @dhh1128 @ewelton (and others) - Please always wrap @context (and other @-things) in backticks, i.e. --

`@context`

-- except where you are intentionally tagging a github user.

(Optimally, go back and edit your previously posted comments to do the same.)

There is a github user with the context handle, and every time an unwrapped @context occurs, they get a notification -- which they don't want from us, as they are not working with us.

SmithSamuelM commented 4 years ago

@talltree. Well stated +1.

When the JSON folks say they want the simplicity of not having an open world extensible model the JSON-LD folks respond that simplicity comes from that very same extensibility. These are two different types of simplicity and they are based on two different design aesthetics. The JSON folks have a very clear view of what they want to do and how to do it and they rationally have concluded that They don’t need JSON-LD. Likewise the JSON-LD folks have a very clear view of what they want to do and how to do it and they rationally have concluded that they need JSON-LD. Its like someone telling someone else they are irrational for preferring pizza over ice cream. What is irrational is to believe that the other side is irrational and that one can persuade them to change their aesthetic. It takes more than that it takes finding a common aesthetic that overrides the conflicting world model aesthetics. So absent that, the practical question is how best to support both aesthetics. And an abstract data model is likely the only approach that could work for both.

iherman commented 4 years ago

I am a bit worried by the approach that you propose https://github.com/w3c/did-core/issues/128#issuecomment-564434490, @talltree; you may underestimate the difficulty of "merging" the two approaches at the end of such a process.

My approach would be a little bit different, namely to do this jointly with some principles in mind.

  1. The goal should be (and I believe already is) that a DID processor should be able to process (whatever that means) a DID Doc without any JSON-LD knowledge
  2. This also means that the problem @OR13 mentioned should not occur: it should be o.k. for a processor to process a (simple? basic?) DID Doc without the presence of a @context. Put it in spec speak: the presence of a @context in a DID Doc should be a SHOULD (or even a MAY?) but not a MUST.
  3. For each key ("name" in JSON speak) that we add to the DID Doc definition we should make it clear under what circumstances the usage of those names would really benefit from the Linked Data aspect and, therefore, would require the author to add @context and consider JSON-LD features. Ie, what it means to put those in a Linked Data context. Authors/methods/etc can then decide and/or require to use JSON-LD or not. Generic statements on Linked Data would not be helpful enough and would just shy away some people.
  4. Yes, we may hit some hurdles along the way when the "worldview" clash, but I do not think it comes that often. I was not part of the CCG discussions but, looking at the document right now, the only place where I can see an issue is the one referred to in #65 (and it seems that a proper compromise has been found there, essentially using the same pattern as for VC).

    In general, using the "non-LD worldview" might keep us in line to get the simplest possible set of concepts even if it requires some compromise on the LD side; using the "LD worldview" might force us to do a cleaner modeling of our data.

Is this a viable design method moving forward?

iherman commented 4 years ago

@SmithSamuelM has posted his comment almost at the same time :-) He said:

And an abstract data model is likely the only approach that could work for both.

and that is perfectly fine and true. But the abstract data model has to be embodied in a syntax, and I am worried to create too many syntaxes in parallel might backfire on us.

SmithSamuelM commented 4 years ago

@iherman

I think that making @context optional as in MAY versus MUST (as it is now) would go a long way to resolving the conflict. That would also mean I believe that any use of @references in a document are MAY not MUST.

The proposal to use JSON as the default encoding would minimize syntaxes and would be compatible with having JSON-LD syntax be a MAY versus a MUST. But that proposal did not seem to go over well. Hence the alternative of an abstract syntax. But I agree that your proposal is a reasonable way to enable the two approaches to the world model to co-exist.

TallTed commented 4 years ago

@SmithSamuelM -

Please note that github user @Drummond (whose human-world name is Valerie Drummond) is not the same as github user @talltree (whose human-world name is Drummond Reed).

Also, github users @context and @references do not need to be notified of your comments here. Please edit your latest to wrap those strings in backticks!

We really need to be more careful in how we refer to entities!

talltree commented 4 years ago

@iherman I see your point and agree what you suggest could be a constructive way for the two groups (representing the two worldviews) to work together on the semantics. I'd like to explore that in more detail as it may be the fastest way forward.

RE the @context statement, in a discussion with @SmithSamuelM and @tplooker at the Hyperledger Aries Connectathon last week, Sam make a point I'd never heard before, and which resonated strongly with me. What he said was that DID document authors need a way to explicitly indicate that no JSON-LD processing should be applied to a DID document. There can be multiple different reasons for this, but the two we discussed were:

  1. The author may want the DID document to be consumed in a security context that does not accept open world document formats (Sam says this is true of the Trusted Computing Group TEE environments).
  2. The author may want to signal to resolvers or other DID document consumers that no vocabulary is used beyond the "plain JSON" vocabulary defined in the DID spec (for speed in high volume applications, ease of processing, etc.)

When I started looking at the @context statement through that lens, I saw it in a whole new light. Rather that it being a MUST or a SHOULD or a MAY, the rules could be:

  1. If a DID document author wants the option of having JSON-LD processing applied to the DID document, the DID document MUST include the @context statement.
  2. If a DID document author does not want JSON-LD processing applied to the DID document, the DID document MUST NOT include the @context statement.

That would a very clean way for us to have our cake and eat it too, i.e., for all DID documents to share the "simple JSON" syntax and then for DID document authors who want to use the features of JSON-LD to be able to do that with a clear indication of that processing model.

iherman commented 4 years ago

@talltree,

When I started looking at the @context statement through that lens, I saw it in a whole new light. Rather that it being a MUST or a SHOULD or a MAY, the rules could be:

  • If a DID document author wants the option of having JSON-LD processing applied to the DID document, the DID document MUST include the @context statement.
  • If a DID document author does not want JSON-LD processing applied to the DID document, the DID document MUST NOT include the @context statement.

Looking at it from the point of view of testing (that will become a core issue in the rec process later), what can be tested is the presence of, or the absence of, the @context in the JSON file. Statements like "does not want" is not really a testable statement. That may be an issue with your formulation. In this respect, something which simply states that the presence of the @context makes it possible to use the DID Doc in a Linked Data setting seems to be enough for me; not putting the @context into the file is equivalent with what @SmithSamuelM said, ie, that the author does not intend this DID Doc to be treated as JSON-LD.

But I must admit I do not have a strong feeling about this, I see it as a stylistic difference. I let the document editor work this out :-)


We should all be careful about the usage of @context, we are sending series of pings to guy out there...

msporny commented 4 years ago

@talltree wrote:

explicitly indicate that no JSON-LD processing should be applied to a DID document.

Finally, something we can work with! Thank you @talltree!

The author may want the DID document to be consumed in a security context that does not accept open world document formats (Sam says this is true of the Trusted Computing Group TEE environments).

Good, this is a concrete requirement that enables me to write a concrete PR against the requirement.

The author may want to signal to resolvers or other DID document consumers that no vocabulary is used beyond the "plain JSON" vocabulary defined in the DID spec (for speed in high volume applications, ease of processing, etc.)

Yes! Another good requirement. For the rest of you in this thread, these are the sorts of things that help editors write text that may achieve consensus.

Ok, so I've now spent close to 5 hours reading and re-reading this thread and just spent two hours trying to construct text that I think may have a chance at achieving consensus. Here's a concrete PR that attempts to synthesize this issue into a concrete spec change:

https://github.com/w3c/did-core/pull/142

Please jump to the PR and let's see if we can hammer on the language and get something that achieves consensus (note: I didn't say "makes everyone happy"... everyone in this thread is going to have to start compromising).

TallTed commented 4 years ago

@iherman - re https://github.com/w3c/did-core/issues/128#issuecomment-564963432

We should all be careful about the usage of @context, we are sending series of pings to guy out there...

I have to note that the quoted section of your comment includes two unwrapped instances of @context ... which unwrapping appears to have been done by a copy-paste of that section. (Github's browser-based "Quote reply" function retains such wrapping and other Markdown markup.)

OR13 commented 4 years ago

Can we close this, we are trying to support 3 representations using the new https://github.com/w3c/did-core-registry

IMO this issue was resolved at the F2F... and if it was not, we should focus our criticism on the did core registry.

SmithSamuelM commented 4 years ago

I am fine with closing it.

talltree commented 4 years ago

I too am fine with closing it.

msporny commented 4 years ago

I am fine with closing it. I too am fine with closing it.

Thanks, closing because the issuer submitter (and concerned parties) are ok with closing it, there was a resolution to specify an abstract data model, and the specification has been changed to include an abstract data model section (that is waiting for content, but everyone expects that content to be written soon), and there is now a registry that assumes the existence of an abstract data model.