w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
407 stars 95 forks source link

Can DID methods be semantically versioned? #715

Closed TimDaub closed 3 years ago

TimDaub commented 3 years ago

Say I have a protocol that issues it's own IDs for strings.

did:strings:<md5 hash>

I initially define the "strings" DID method as: Take the DDO, which for simplification purposes is just an arbitrary string, input it in the md5 hash function and use its digest as the did method-specific identifier.

Now a few years go by and md5 gets broken. I'd now like to replace it with another function. But I'd like to stay in my strings universe.

Do DIDs allow semantic versioning out of the box? If so, how would it work? I'd be curious to hear some takes.

csuwildcat commented 3 years ago

You can now do this thanks to the equivalence properties defined in the specification, which allow DID Methods to determine there are multiple representations of the same logical DID:

Without these properties, there would have been no way for an ID itself to ever securely be upgraded/morphed to account for such things, which would be a serious problem when you're talking about IDs that may need to be durable across an entire human lifetime.

kdenhartog commented 3 years ago

This is something that has been done in at least one did method already. I'd suggest taking a look at the did:peer Method Specific Identifier for an example of how this can be done. In particular, the numalgo field is being used to indicate a version of the method.

So the best way to answer this question is did-core doesn't specify any way that this should normatively be handled and it's up to the method to define how that might work. Does that answer your question @TimDaub ?

csuwildcat commented 3 years ago

@kdenhartog he's talking about fundamentally changing part of the ID string segment, which, without the canonical/equivalent properties, is simply not possible, because any change of that type would be regarded as two different DIDs. Only canonical/equivalent properties, which are defined in DID core, allow you to do things like switch out the hash algo for a DID string segment and still be able to have the resulting DID string be associated with the same logical DID.

csuwildcat commented 3 years ago

@TimDaub I just want to be clear that if you create/use a DID Method that is leveraging the canonical/equivalent properties that are specified in DID Core, you can absolutely do what you are saying. Here are those properties again, for reference:

^ using these properties you can create a DID Method that provides durable, flexible DIDs that can evolve representations, whether it be from a change in hashing function or any other modification to their URI value segments, all while still maintaining the same logical DID across their lifetimes. I am so glad others are realizing there is incredible value in having more durable IDs that can flex to account for things like this.

iherman commented 3 years ago

The issue was discussed in a meeting on 2021-03-30

View the transcript #### 5.6. Can DID methods be semantically versioned? _See github issue [#715](https://github.com/w3c/did-core/issues/715)._ **Manu Sporny:** The answer is yes; we don't really need to say anything about it. I'll take it **Kyle Den Hartog:** Easy way to answer that: reference the [did:peer](did:peer) stuff. They have built in a way to semantically version the different DID stuff. I'm happy to answer that if that would help. **Manu Sporny:** Yes, please
TimDaub commented 3 years ago

@kdenhartog he's talking about fundamentally changing part of the ID string segment, which, without the canonical/equivalent properties, is simply not possible, because any change of that type would be regarded as two different DIDs.

Correct.

However, I'm confused about how

are supposed to work. An example both in the spec and here would be quite helpful. Also: Why are there two ways of doing this? I don't understand the difference between a canonicalId and a equivalentId. There's a section in the spec but it's quite technical:

The canonicalId property is identical to the equivalentId property except: a) it is associated with a single value rather than a set, and b) the DID is defined to be the canonical ID for the DID subject within the scope of the containing DID document.

What does that mean? How can equivalentId be a "set"? Or rather, how can canonicalId and equivalentId be used within the DID Syntax?

The value of canonicalId MUST be a string that conforms to the rules in Section § 3.1 DID Syntax.

In the ABNF, I don't see where I could fit it. I can see how it'd fit in the DID URL Syntax ...

However, in https://www.w3.org/TR/did-core/#query there's a reference to be made to a versionId. What is it versioning and how is it different from an equivalentId and a canonicalId?

did:example:123456?versionId=1

Is this how I'm supposed to do it?

did:strings:<md5 hash>?canonicalId=0.0.1

Anyways, for me it'd make the most sense to plainly version the did method by adding a semantic version next to it somehow.

did:strings-v0.0.1:<md5 hash>

Btw: This is a real question we asked ourselves over at Ocean Protocol: https://github.com/oceanprotocol/market/issues/431

csuwildcat commented 3 years ago

@TimDaub @kdenhartog I think there is a fundamental misunderstanding here. I believe this Issue is not really asking about versioning a DID Method, given the provided example asks about how one DID URI string can be made synonymous with another DID URI string representation. Look at what they actually said: "If I have a DID of did:example:HASH_ALGO_1, how can I shift that DID to did:example:HASH_ALGO_2" <-- this is EXACTLY the sort of thing canonical/equivalent properties allow for. Canonical and Equivalent ID properties enable there to exist a singular logical DID in a DID Method that can potentially take many representational forms.

@TimDaub you bring up how canoncicalId and equivalentId work, so I'll start by listing a few things the spec details:

  1. These are not URL parameters, they are document metadata fields
  2. They are similar in some ways, but have different semantic meanings/capabilities
  3. equivalentId = forms of the resolved ID that are logically equivalent to the same singular, logical ID in the system
  4. canoncicalId = a single equivalent form of the resolved ID that the Method determines to be THE canonical form you must use going forward after resolution.

To your question, Tim, about "How am I supposed to do this?"

did:strings:<md5 hash>?canonicalId=0.0.1 <-- =No, canoncicalId and equivalentId are not parameters

did:strings-v0.0.1:<md5 hash> <-- No,canoncicalIdandequivalentId` are changes in the DID URI segment values

If your Method supports this area of the spec, someone would simply resolve did:example:HASH_ALGO_1, and in the DID Document metadata it would show this, assuming multiple representations existed:

{
  "equivalentId": [did:example:HASH_ALGO_2, did:example:SOME_OTHER_FORM_3]
}

If one of the representations was THE canonical representation of the ID, it would also include the canoncicalId property, like this:

{
  "canoncicalId": did:example:HASH_ALGO_2
}

I don't think people fully understand how absolutely critical this is, but if you want DIDs that can survive for decades at a time, you're likely going to need to transition their actual URI strings without losing the provenance of the logical ID within the target system. This is what canoncicalId and equivalentId allow you to do in an automatic, deterministic, programmatic fashion.

csuwildcat commented 3 years ago

The result of this issue should be: the feature/capability in question exists, and is supported via the canoncicalId and equivalentId DID Document metadata properties.

msporny commented 3 years ago

I don't think people fully understand how absolutely critical this is

Perhaps some don't, or perhaps some do understand... but in a different way: They're concerned that the equivalentId/canonicalId approach is concerning.

I always feel like I have to provide the warning: "canonicalId" and "equivalentId" increase the complexity of solutions in ways that might be unacceptable to some developers. For example, if a DID Method supports "canonicalId" and "equivalentId", as a developer, I now have to put that concept into the system I'm writing... it's a leaky abstraction that I now have to deal with... and some developers will get it wrong, and others won't support it either out of ignorance or out of protest... and that could lead to a non-interoperable DID Method ecosystem.

Food for thought.

csuwildcat commented 3 years ago

@msporny I want to make this absolutely clear: no one on this tread is saying what you just projected on them --> "They're concerned that the equivalentId/canonicalId approach is concerning.". No, they simply didn't understand them and were trying to suggest things that fundamentally do not do what their example/question alluded to. There is no other way to automatically, deterministically, programmatically transition from one of an ID's forms to another without something like equivalentId/canonicalId, that's not an opinion, it's an empirical statement of fact.

csuwildcat commented 3 years ago

@msporny you will certainly have Methods that end up telling their users: "Sorry, we have no way of sanely transitioning ID forms, so you'll have to either ditch those DIDs you were using, or engage in a hilarious game where you try to contact everyone you ever used those DIDs with in an attempt to have them swap everything over to a new DID through a series of Rube Goldbergian one-off communications/protocols designed to do what two properties would have automatically, deterministically, programmatically done for anyone who resolved the DIDs"

TimDaub commented 3 years ago

Hey @csuwildcat, that was a great explanation and now I understand what is meant by "set" (equivalentId) and "string (canonicalId). Thanks!

Still, I don't understand how any of these could be used for semantical versioning of my DID. Let me state my case once again:

I'm starting to issue did:example: HASH_DIGEST_1 in a project and when you look up the DID method example you'll find that to create HASH_DIGEST_1 you'll need to use HASH_ALGO_1on e.g. the DDO that is represented by my DID.

Now, users can go on e.g. my website and they can ask for GET https://website.com/examples/did:example:abcd and they'll get the respective DDO where HASH_ALGO_1(DDO) == abcd. A few things are important to note here: If I want to create a canonical digest from DDO, (1) I'll have to specify the number of required properties that will always be used as an input to HASH_ALGO_1. Additionally, (2) I'll have to specify in e.g. what order the DDO is getting hashed. I could e.g. use an approach like CBOR.

Through this approach, I now know that for the DID method example I can rely on the integrity of the DDO when requesting it from website.com. Additionally, the DID has been created transparently for anyone.

Now, a few years pass by and now it turns out HASH_ALGO_1 is not secure anymore. E.g. some have found collisions. I now want to upgrade HASH_ALGO_1 to HASH_ALGO_2 to continue delivering the guarantees to my users. Remember, they're used to the DID method example and to the REST endpoint, I've established (https://website.com/examples/did:example:abcd).

So now, since I've reserved the name example for myself years ago and since in practice nothing has changed around my idea of delivering content addressed DDOs, it wouldn't make sense to now name the method "dinosaurs" or "icecream" or "example_new". Instead, it'd make sense for me to signal to anyone that has to resolve a DID of example, that there are multiple versions within the realm of example. Say I now use canonicalId or equivalentId in the DDO, how do I know when looking at did:example:abcd that it's the first version of DID method example? Or from a user perspective, say I send https://website.com/examples/did:example:abcd to someone else via messenger/email and they open it. How'd I been able to encode a dictionary-style canonicalId or equivalentId in there?

Rather, what I'd find useful is that for both users and resolvers, the DID method itself gives information about which version should be used to resolve.

Hence my example:

did:example-v0.0.1:abcd => resolve and validate using HASH_ALGO_1 did:example-v0.1.0:abcd => resolve and validate using HASH_ALGO_2 ...

Again, real-life example where Ocean Protocol has issued did:op:<hash> where now the input to "hash" could change in the future: https://github.com/oceanprotocol/market/issues/431

they simply didn't understand them

It's probably true that I'm not aware how canonicalId and equivalentId can be used for semantic versioning.

There is no other way to automatically, deterministically, programmatically transition from one of an ID's forms to another without something like equivalentId/canonicalId, that's not an opinion, it's an empirical statement of fact.

Really? Why?

csuwildcat commented 3 years ago

@TimDaub I don't think you would want "versioned" IDs, given your example. Instead I would suggest the following:

  1. At initial Time 1, the ID did:example:abcd, which uses HASH_ALGO_1 as part of its DID URI string, resolves and all is well with the world
  2. At future Time 2, the ID did:example:abcd's value segment abcd, which was initially generated via HASH_ALGO_1, is now broken, so you want to ensure that future resolutions are securely, deterministically, automatically associated with another form, and resolve to the same logical ID/DDO.
  3. To do so you would, within your Method, add code that securely, deterministically creates the equivalence between did:example:abcd and did:example:efgh which is the same logical ID that should be recognized by all resolving parties as one and the same, resolving to the same DDO.

In summary: you don't have to go around forking your DID Method or tagging version numbers on DID URI segments, which implicitly creates fundamentally different and unlinked DIDs according to the spec and would result in a mess of divergent DID refs spread across the Web. Instead, you can use canonicalId and equivalentId to seamlessly transition forms of IDs across changes in their construction, like in your example.

I would be happy to have a call with you if you want to dig in further.

csuwildcat commented 3 years ago

Also consider that going the route of did:example-v1:123, did:example-v2:456, did:example-vN:789, implies the following:

  1. You would basically have to fork your entire DID Method just to transition phenotypical representations of DIDs and other Method-internal changes to how a Method's IDs are resolved.
  2. It results in a condition where you are still creating completely separate DIDs, then find another way to deterministically link them to each other (assuming you don't want to strand people's ID lineages/linkages across the different 'versions')
  3. If you don't want to strand ID lineages/linkages held everywhere across the world when you shift Method/ID 'versions', you would need to figure out how to reconnect to all the people, entities, apps, and services that currently hold references to past Method/ID 'versions' and move them all from 'version' to 'version'. The complexity involved in creating communication layer protocol standards across all the N protocols and communication channels is prohibitive, and being able to connect to potentially 1000s of entities in ad hoc flows is simply infeasible.
  4. It will implicitly litter the Web (or any other system in which IDs have been strewn) with IDs that will, in most cases, effectively become deadened/useless over time. It also dramatically increases complexity for everyone who has to deal with the N variants that continue to multiply over time without being able to stay on the right 'track' of a DID automatically.

^ canonicalId and equivalentId allow us to avoid forcing Method authors to 'version' Methods/IDs needlessly, automatically assures that everyone who holds a DID reference remains on the right 'track' of a DID across numerous types of Method and ID representation changes, and radically reduces complexity for everyone involved with a mechanism that automates many types of upgrades at the deterministic point of resolution.

msporny commented 3 years ago

@TimDaub has your question been answered at this point? Can we close this issue?

TimDaub commented 3 years ago

Yes, thanks all.