w3c / rch-wg-charter

Charter proposal for an “RDF Dataset Canonicalization and Hash Working Group”
https://w3c.github.io/rch-wg-charter/
Other
12 stars 7 forks source link

Change of terminology "Canonicalization" -> "Canonical Labelling"? #52

Closed iherman closed 3 years ago

iherman commented 3 years ago

I am wondering whether we could not avoid unnecessary frictions and misunderstandings by slightly changing our terminology.

The term "Canonicalization" automatically triggers, for many, a reference to the Canonical XML Specification. Without going into the details, that specification describes, fundamentally, a complex syntactic transformation of the original XML content (see the overview of the type of transformation in the Terminology Section of the aforementioned specification). Implementation of those steps are complex and many in the XML community claim that this is an unnecessary step for security purposes.

However. The case of RDF Graphs is fundamentally different and has no analogy in the XML (or JSON, for that matter) context. The problem to solve is to define a canonical blank node mapping or (canonical blank node relabelling), which happens on the abstract RDF graph and not on a specific serialization. This is deeply rooted in the RDF data model.

Comparison between the RDF blank node relabelling and XML Canonicalization is therefore comparing apples and oranges, and only the source of unnecessary frictions and discussions.

My proposal: let us do an overall change of terminology in the charter and all the other documents, replacing the term (Linked Data) Canonicalization by, say, Canonical Labelling (I am not bound to this term, if there is a better one I am fine, too).

Wdyt?

@msporny @dlongley @pchampin @samuelweiler @wseltzer @aidhog

msporny commented 3 years ago

While what you say is true, @iherman -- people stating that "XML Canonicalization was hard and we shouldn't repeat those mistakes!" fundamentally don't understand that they're talking about a different problem space.

The term "canonicalization" is a well known and defined term in both mathematics and computer science:

https://en.wikipedia.org/wiki/Canonicalization https://en.wikipedia.org/wiki/Canonical_form https://en.wikipedia.org/wiki/Graph_canonization

The term "Canonical Labelling" isn't different enough to get people to stop jumping to the wrong conclusion. I will also note that everyone that I know that brings up the "XML Canonicalization" argument is arguing against doing the work and has no interest in the problem space or in solving the problem. From their viewpoint, it's a solved problem, "just base64 encode the payload and you're done".

-1 to moving away from "Canonicalization", which is what we're actually doing here.

dlongley commented 3 years ago

While I'm sympathetic to the concerns, unfortunately, I don't think "Canonical Labeling" will help avoid conflict here... and it would likely add more confusion.

iherman commented 3 years ago

While I'm sympathetic to the concerns, unfortunately, I don't think "Canonical Labeling" will help avoid conflict here... and it would likely add more confusion.

Is there a different term that would be a good approach for us and would avoid these endless discussions?

@msporny I of course know that "Canonicalization", etc, are well accepted terms for those who know this space, but that is not the point. The point is that the term is overloaded, which works against us...

aidhog commented 3 years ago

I tend to agree with @msporny. I think "canonicalisation" has solid mathematical etymology. Granted though I don't fully understand the problems that might occur if people conflate this with XML canonicalisation, so I might not understand the counter-arguments.

So I would suggest to stick with "canonicalisation".

Otherwise "canonical labelling" is a reasonable alternative. Other alternatives might be "canonisation"; "normalisation", "deterministic labelling" (maybe some variant of "hashing" or "signing").

But I think "canonicalisation" is best. If people conflate it with something else, we can just tell them that they are conflating it with something else. :)

iherman commented 3 years ago

Closing by virtue of #54