w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
409 stars 95 forks source link

Should we support multiple encodings of DID docs? #140

Closed talltree closed 4 years ago

talltree commented 4 years ago

Per the comment I just made in #128, I suggested that rather than arguing abstract data models in the abstract (pun intended), we could start making progress by making a series of more concrete decisions that will help us down the path to consensus.

The first decision I propose we make is a simple yes/no answer to the question: should our spec(s) support multiple encodings of DID documents? Examples include plain JSON, JSON-LD, CBOR, PDF (for legal documents), ISO 20022, health record formats, etc.

In #128 we heard multiple arguments both for and against, but they were in the context of a much larger/longer discussion. My suggestion here is that folks weigh in just on this more focused question, especially by starting their reply with a simple Yes or No before giving any rationale. That will make it easy to scroll through and get a general sense of the sentiment.

IMPORTANT: The questions of which encoding(s) we should specify is a different question than should we support more than one. I strongly suggest we focus this issue on just seeing if we can come to consensus about whether our spec(s) should be able to support more than one.

selfissued commented 4 years ago

Yes

To keep things simple both for encoding authors and implements, I'd advocate that each encoding be in its own self-contained specification. And I'd start with the JSON encoding.

msporny commented 4 years ago

Yes.

This is probably the wrong base question to ask, though. The first question should be about the data model... what things should the data model be capable of doing... versioning, extensibility, open world, etc... starting from syntax starts in the middle of the stack.

I'd rather we put a poll together and see what features implementers are looking for in the data model:

... it's direct questions starting at the foundation and building up that will help us see what the requirements are on DID Documents. Starting at syntax starts in the middle, good architecture starts from the ground up.

peacekeeper commented 4 years ago

Yes

I agree with @msporny that it's hard to argue about specific encodings without first having a shared understanding of key features and use cases that need to be supported by DIDs, DID URLs, DID documents.

brentzundel commented 4 years ago

I agree that it's difficult to argue about specific encodings without first having a shared understanding of key features, but this issue isn't talking about specific encodings. This issue is only about whether the WG feels that our spec(s) should support multiple encodings of DID documents.

ewelton commented 4 years ago

No - add them as needed as dependent specs

I agree with @peacekeeper and @msporny - an abstract consideration of the merits of multiple encodings is too open ended. To @brentzundel's point , I think the concern is that abstractly considering "should we have multiple encodings" is missing the context of "why" - what problem does it solve? What is driving the need for multiple encodings?

Without any context, I would say "no" - until there is a demonstrated need for multiple encodings, then it simply adds to the complexity and confusion. Focusing on KISS principles will let us focus on the use cases and not mechanics.

In the event that we run into some strong case for alternative encodings, I would suggest derivative specifications for alternative formats/environments - along the lines of @selfissued's comment - these could take on domain specific knowledge - "PDF was chosen for DID-Docs in the legal context, and here is how they are encoded, and how we establish equivalence". Note that this would be a spec about DIDs in a domain - capable of addressing specific deficiencies relative to a domain.

----- that's the answer, below is supporting argument

I think the focal use-cases are currently heavily skewed towards a class of tasks that fit nicely in the Layer1 bootstrapping - they are all about "agency-capable entities" - where it makes sense to think that there is an active software agent operating on behalf of the DID and where it is reasonable to think of the DID-Subject as "often, also the controller" . In that context, I think the way the ecosystem is shaping up is great. I really do back the approach being taken, specifically around ToIP and the Layer1/2 separation of concerns.

However I am also concerned that we are redefining the problem to fit that specific solution. Towards that end, time permitting, I will suggest some PRs to the use-case document - i want to look at situations where:

Thus, unless there is a compelling reason to have multiple encodings - i would argue that what we have w/ JSON/JSON-LD is enough, is fit for purpose, and provides a structured approach to thinking about how DIDs & DID-Docs satisfy the use cases that justify them being pitched at the level of URLs and in the context of global public resolvability.

I do not see encodings as hampering adoption as much as confusion about "why use DIDs at all?" - In fact, a plethora of encodings is likely to have a much more deleterious effect.

So I'll put my hat in the ring with a soft "no" - one encoding is fine until we can clearly explain what DIDs do - and if that means reducing the problem scope to fit a specific solution, that will have a much larger impact on adoption. When we run into the problem of "I would use DIDs, except WestLaw only takes files in PDF format" then the WG can whip up a derivative spec to address the issue.

And we need to decide whether or not we want to allow people to use DID-Docs as they see fit, or if we are going to centrally mandate global conformance to a specific vision (how would we police conformance?)

The current solution, using JSON-LD, supports structured growth of DID-Doc usage as a first-class member of the global ecosystem - and I can not think of any alternate encoding that fits the bill. Thus, if we find we really need alternative encodings they will be purchased by a restricted set of use-cases for DIDs - which may be a good thing but will immediately lead to the development of a "new kind of identifier that puts me in control" - a DID+ to wrap around DID-, probably including it as a context.

darrellodonnell commented 4 years ago

I'll put in a strong "NO". I can't conceive of a reason why we would need multiple encodings in this standard.

We can easily have related/derivative encodings - some that link via extension. The "legal-PDF" extention mentioned above could be a good example. A DID-Doc could have a reference to the (richer?) legal-PDF. But the legal-PDF isn't by itself, a DID-Doc.

msporny commented 4 years ago

@ewelton wrote:

No - add them as needed as dependent specs

Wait, the question was "should our spec(s) support multiple encodings of DID documents?" ... it wasn't, "should we specify multiple encodings right now?", which is the question that I think you're answering. You seem to contradict yourself... by saying "no, we shouldn't support multiple encodings"... and then you say "we should support multiple encodings in separate specs"... which presumes some sort of general data model. Your response is confusing to me, @ewelton. :)

@darrellodonnell wrote:

I'll put in a strong "NO". I can't conceive of a reason why we would need multiple encodings in this standard.

Some software engineers in the 1990s couldn't conceive of why anyone would want anything other than XML... until JSON came along... and then people couldn't conceive of why anyone would want anything else... until CBOR came along. History is riddled with examples of why basing your specification on a single encoding format guarantees that your standard won't survive one syntax-du-decade cycle. I hope we're thinking longer term than that with DID Documents.

darrellodonnell commented 4 years ago

@msporny wrote:

Some software engineers in the 1990s couldn't conceive of why anyone would want anything other than XML... until JSON came along...

Manu - I have personally been through deep XML Schema and I think that JSON is far more friendly for developers - though limited. My issue here is not about JSON vs. JSON-LD vs. CBOR vs. XML (dare I add ASN.1 in there?).

My issue is that supporting more than a single encoding adds such burden and indirection/noise that any standard that does so will stunt itself. There is nothing wrong with versions of a particular standard evolving. Picking N encodings are, to my best guess, a certain way in that one or more encodings would be thrown into the ditch of history due to lack of adoption down the road.

Once a standard is adopted it tends to reach a level of normalization, where it doesn't need to evolve much further.

If there is a dire need for this extensibility then propose a dead simple specification for the root of a DID-Doc and then other specifications that add that explicit extention. e.g. DID-Doc-LD, DID-Doc-Legal (PDF++). If one or many of those derivatives start to show that the root-level standard isn't whole without something (imagine a particular pattern showing up in each derivative because the root DID-Doc standard was really missing a concept), then you revise it and continue.

ewelton commented 4 years ago

@msporny yes, I was answering the "right now" question, suggesting that we stick w/ one single encoding until the data-model is clear, and then deal with other encodings in adjacent specifications. Specifically specialized specifications speaking succinctly to domain idiosyncrasies. I don't want to close the door to multiple encodings, but I think development of the specification would benefit from semantic focus rather than syntactic focus.

@darrellodonnell

If one or many of those derivatives start to show that the root-level standard isn't whole without something (imagine a particular pattern showing up in each derivative because the root DID-Doc standard was really missing a concept), then you revise it and continue.

To me the attraction of JSON-LD as a base encoding is that we have built in extension via @context - the specification itself can be simple, the semantic work is done by the contexts - and the "core context", the only required context, is the root doc that speaks to the Layer 1 DPKI concerns.

combining both....

It is that restricted minimal context which might be useful in a variety of encodings. If there is a specification relevant to, for example, health records, which suggests encoding in a different format - then the adjacent specification "DID-Doc-HealthRecords88" can clearly say, "here is how you provide a DID-Doc (with specific semantics) using encoding GovHealth2112.

saying yes and no at the same time...

My feeling is that we will benefit from accepting JSON-LD, warts and all, until such time as we have a clear and robust understanding of DIDs across a wide, wide range of use cases. Once the specification has settled down and we are comfortable with our data-model, and have wrangled how dids, method specifics, resolution, and all of the complexities that emerge from self-certifying identifiers play together - that would be a great time to shift gears and work towards replacing JSON-LD with ASN.1 and XML-Schema! ;)

...... and now the TL;DR....

In the abstract, who could possibly argue with multiple encodings. I thought that JSON-LD provided an extremely clear and technically strong pathway which gave us flexibility and avoided strong commitments such as "this is what a DID-Doc is and you MUST ONLY use it for the following purpose" - In my thinking, JSON-LD gave us a wonderful framework around which a DID-ecosystem could be built - with different capabilities, such as "Layer 1 bootstrapping" or "Suitable for use by lawyers" cleanly captured by the @context array.

What I loved about JSON-LD was that if I wanted a DID-doc to do something novel, then no problem - it is, after all, your doc so the only thing you really need to do, if you want to play nice, is state your intentions to the world in the form of an element of the @context array. In the end, it is a doc shared by you and the method so it is the method that imposes the terms-and-conditions and restrictions - in exchange for providing real world resources to back your DID.

But - i dunno - maybe it is fundamental in my nature, but one attraction to working on this spec is the intention of flagrant violation upon release. What is the point of having control over your identifier if someone else is in control of what you can do with it and what it means? Cryptographic trust, ToIP - that's fine - but it is not semantic trust. Leepa-chai tipi-tai my main daime. wa da tah!

I do not want to have DIDs such that I can have any color DID doc, as long as it is blue. If we go with a restricted form of DID document, I will introduce did:noncompliant:X which only returns documents with a non-compliant abstract data model - it should sit on the pile right next to did:gmail: and did:facebook:

What I really loved about JSON-LD for DID-Docs was that the only thing you need in a DID-Doc was @context - with JSON-LD, everything else is window dressing. @context is a claim, made by the DID-controller, about the document itself - it is the root of semantic trust - the "core DID-doc" - the document described by "a specific context" can be encoded on billion different ways, seamlessly and without compromising the integrity of DIDs. The PKI dance done by the core context is the root of cryptographic trust - which anchors the root of semantic trust, which is in @context.

But this issue is about "should the spec focus on multiple encodings"...

Answer: no - the specification should focus on semantics and use cases, and it should utilize JSON-LD because it is expedient and well-formed. It should point to adjacent specifications to capture additional encodings - but let us focus on the structure of the house, not the paint on the walls.

The conversation about "supporting multiple encodings" - in the abstract.... sure, why not - but let's get the spec done first using JSON-LD and then make an adjacent specification that says "and here is how you implement a DID-Doc using Aramaic and stone tablets."

so @msporny yes, i did intend both yes, and no, nah? 555 ;)

TallTed commented 4 years ago

@ewelton (and others) - Please always wrap @context (and other @-things) in backticks, i.e. --

`@context`

-- except where you are intentionally tagging a github user.

(Optimally, go back and edit your previously posted comments to do the same.)

There is a github user with the context handle, and every time an unwrapped @context occurs, they get a notification -- which they don't want from us, as they are not working with us.

ewelton commented 4 years ago

@TallTed That's a great point - I edited the last post. I'd pasted it from another editor so missed the little user-tagging window that pops up and didn't think about that at all.

talltree commented 4 years ago

As the original poster of this issue, I propose we close this issue on the basis of the "grand compromise" we reached at the Amsterdam F2F that the spec will now support multiple representations—and this change has already been applied to the restructured spec.