w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
409 stars 95 forks source link

Request for Guidance on Normalization Rules Enforcement #842

Open EzequielPostan opened 1 year ago

EzequielPostan commented 1 year ago

Issue Description

The current version of the DID Core specification (https://www.w3.org/TR/did-core/#services) states that the value of the serviceEndpoint property MUST be a string, map, or a set composed of one or more strings and/or maps. Additionally, it specifies that all string values MUST be valid URIs conforming to RFC 3986 and normalized according to the Normalization and Comparison rules in RFC 3986 and any normalization rules in its applicable URI scheme specification.

The issue at hand is that RFC 3986 does not provide an explicit list of normalization steps. Different libraries enforce different additional rules to normalization. As we are implementing a new DID method, where users will be submitting DID creation and DID update events, we find ourselves in a dilemma. We want to ensure compliance with the specification but lack clear guidance on the specific normalization rules to enforce.

Without a shared list of normalization rules followed by all implementations of DID methods, we are unsure whether to shift the responsibility of normalization to the users and let them decide what rules they require. However, this approach conflicts with the specification (we would allow users to produce non compliant DID documents).

Request

We kindly request guidance on the following options:

  1. Could it be possible to update the W3C DID Core specification to include an explicit list of normalization rules, accompanied by a comprehensive test suite? or,
  2. Could it be possible to remove the normative enforcement for normalization, allowing implementers to determine the level of normalization they wish to enforce?

Thank you for your attention and support

kdenhartog commented 1 year ago

Could it be possible to update the W3C DID Core specification to include an explicit list of normalization rules, accompanied by a comprehensive test suite?

At this time no, but this could be possible in an updated version of the specification. Currently there's discussion about rechartering the working group though so we'll have to wait until that get's decided before this can move forward if we take this route.

Could it be possible to remove the normative enforcement for normalization, allowing implementers to determine the level of normalization they wish to enforce?

This would also require an update to a normative change which would be a class 4 change which is W3C lingo for we're changing normative statements and means we need a specific type of WG to change it.

A potential solution that might be available right now to us would be to update the requirements of the registry so that any registered service endpoint is required to define this. cc @msporny to see what his thoughts on this might be.

Also, @pchampin it looks like I'm still apart of a W3C group that allows me to triage tickets in this repo I think because I can close this issue and assign myself (but can't update labels). Could you take a look at what group that might be and remove me? I'm unlikely to participate in the WG at that level anymore due to time commitments so can have that authorization removed from my GH account.

msporny commented 1 year ago

@kdenhartog wrote:

At this time no, but this could be possible in an updated version of the specification.

Yes to everything @kdenhartog wrote above. He is correct that changing a global standard isn't simple when you don't have an active working group that is chartered to make breaking changes. This is by design, to ensure that these global standards stay stable for long periods of time.

A potential solution that might be available right now to us would be to update the requirements of the registry so that any registered service endpoint is required to define this. cc @msporny to see what his thoughts on this might be.

Even updating the requirements to the registry would have to be done through an active WG, which we don't have right now. That said, this issue will stay open and will be addressed by that WG when it becomes active in the next couple of months.

My suggestion in the meantime is to use this issue to track the normalization rules for the URIs that you see people using in the wild. At present, the vast majority of these URIs can use the normalization rules defined in the WHATWG URL specification:

https://url.spec.whatwg.org/#concept-url-serializer

If you can find a URI scheme that can't work with the above, and doesn't have a spec w/ normalization rules for it, we could use this information to modify the language in the specification in the next charter.

@EzequielPostan, does this give you enough guidance to provide to your community?

EzequielPostan commented 1 year ago

thank you for the replies

My suggestion in the meantime is to use this issue to track the normalization rules for the URIs that you see people using in the wild.

The thing is that, in the wild, we don't see mentions to normalization rules. At a quick glance:

Sidetree's spec says:

The object MUST include a serviceEndpoint property, and its value MUST be either a valid URI string (including a scheme segment: i.e. http://, git://) or a JSON object with properties that describe the Service Endpoint further. If the values do not adhere to these constraints, the entire Patch Action MUST be discarded, without any of it being used to modify the DID’s state.

which enforces little to no normalization.

Another popular example, did:peer spec does not seem to mention normalization at all.

We haven't explored in full depth, but in general, DID methods' specs don't mention the topic.

  1. Is there any method of your knowledge enforcing extensive normalization rules on URIs?
  2. Are DID methods enforcing normalization without mentioning it in their specs?
  3. is the universal resolver project checking/enforcing anything on this?

At present, the vast majority of these URIs can use the normalization rules defined in the WHATWG URL specification

The problem we face is not the lack of possible specs/RFCs/libraries/groups describing normalization rules. The issue is that, if the spec is not enforcing any clear specific rules and test vectors, then in practice it is enforcing none, because each method can simply select different sets.

Without a change to the spec, we may just not enforce any normalization, and let the user responsible of normalizing URIs if they see need in their use cases.

Once again, thank you for the time to read and reply to this issue

kdenhartog commented 1 year ago

Is there any method of your knowledge enforcing extensive normalization rules on URIs?

To my knowledge from the various did methods I've read through none have specified this beyond the extent that the sidetree spec did.

Are DID methods enforcing normalization without mentioning it in their specs?

When we had implemented this when I was at MATTR we did do some of our own implementation level normalization for things like this but they were highly specific to the use cases we wanted to use DIDs for. I suspect that's similar for others as well so method authors are treating these as extension points of their method specs as well.

is the universal resolver project checking/enforcing anything on this?

To my knowledge last I checked (been almost 2 years at this point) it's doing very basic normalization, but nothing beyond the scope of what's defined in DID Core. @peacekeeper would be able to speak to the latest for it though I presume.

The problem we face is not the lack of possible specs/RFCs/libraries/groups describing normalization rules. The issue is that, if the spec is not enforcing any clear specific rules and test vectors, then in practice it is enforcing none, because each method can simply select different sets.

Without a change to the spec, we may just not enforce any normalization, and let the user responsible of normalizing URIs if they see need in their use cases.

Once again, thank you for the time to read and reply to this issue

I suspect this is probably the best way to go given that service endpoints were always intended to be a bit more free-for-all to allow for good flexibility here. This is also part of the reason I was thinking this should be defined by the service endpoints registries rather than the methods themselves. Often times the service endpoints are use case specific so over constraining these in did-core or the did-method specs are going to limit the possibility of use cases that can be done with service endpoints. However, the service endpoint registry (and the underlying specs being registered) would properly operate at the use case layer to get more specific about these types of concerns. Hence, my thinking for doing this at that level and setting requirements that the registry set a requirement that these be included by them.

pchampin commented 2 months ago

This was discussed during the #did meeting on 19 September 2024.

View the transcript

w3c/did-core#842 Request for Guidance on Normalization Rules Enforcement

manu: This is someone asking what the normalization rules are for URLs
… Letting implementors decide what level of normalization they support
… A response is the normalization rules for URLs is clear and exists in the WHATWG
… We would need to check these apply cleanly to DID URLs
… We could say we are using the WHATWG normalization rules
… Others state on the issue that people in the field are normalizing in different ways. Very few specs say anything about this.

dmitriz: What is URL normalization?

<manu> These are the URL serialization rules in WHAT WG URL spec: https://url.spec.whatwg.org/#concept-url-serializer

manu: This is about percent encoding. Having dots in the URL path. There is a concept called URL serialization
… See the link above
… apply a series of rules to get to a normalized URL
… problem is DIDs don't have hosts. So we need to analyze this more deeply
… This group needs to see if these rules negatively impact DIDs
… If they don't we should normatively state the WHATWG are the normalization rules we follow

markus_sabadello: Not looked at this in detail. But in the context of DID URL dereferencing. If we also have a path, query string on DID URLs in the same way as on http URLs. Then my intuition is we should use the WHATWG rules

<manu> https://www.w3.org/TR/did-core/#dfn-serviceendpoint

manu: This is the location in the spec the issue is concerned with.

<dmitriz> we COULD sidestep the normalization of service endpoints issue. and require fully qualified URLs

<dmitriz> or not say anything about it.

<dmitriz> (which would mean removing the normalization requirement)

manu: Web browsers URL normalization rules are different from the RFC3936 rules
… options are 1) remove it and not say anything about normalization. 2) Leave it as is and people have to do it, knowing the libraries they will likely use will do something different. 3) Or state that we use the WHATWG rules used by web browsers
… I don't have a strong feeling about the direction

ivan: removing the text is a problem for interoperability. We should not consider this
… I would see what is implemented in various programming environments
… As far as I know all of these use the WHATWG rules

<manu> +1 to what ivan said.

manu: I agree with ivan, lets look at the libraries and see what they do
… We should also generalize the spec text to say that any URLs should be normalized
… We will have to see what happens in specific DID URLs
… Unfortunately there are no tests/examples that show the differences between normalization rules
… We would also need our own tests against some fairly advanced DID URLs

decentralgabe: Lets continue this at TPAC