w3c / rch-wg-charter

Charter proposal for an “RDF Dataset Canonicalization and Hash Working Group”
https://w3c.github.io/rch-wg-charter/
Other
12 stars 7 forks source link

Hash function rebranding? #6

Closed iherman closed 3 years ago

iherman commented 3 years ago

TL;DR: I propose to “rebrand” the “unique identification” terminology into something like “Linked Data Hash” (or “Linked Data Hashing Function”).

Details:

At the moment we say, for example, “The scope of this Working Group is to define a Standard to canonicalize, sign, or uniquely identify RDF Datasets”. The unique identifier we are talking about is the result of a hash function applied on the ordered set of quads representing the canonical version of the graph.

I think, however, that the term “hashing” would resonate more to a number of technical people out there, rather than “unique identification” (even if the two are the same). This dawned on me when @pchampin and I discussed a perfect use case for the technology we are talking about: adding a hash of a dataset in a data or vocabulary repository (e.g., to the Linked Open Vocabularies) just like many repositories of code, applications, javascript files and modules, etc, add the hash value of the relevant resource. This has become familiar to many alongside the concept of signatures. We even have a hashlink proposal for its usage :-). I am a bit afraid that the concept of “identifier” may not resonate the same way.

Technically, we are of course talking about the same thing; this is purely a branding exercise. I am happy to propose a re-write of the text(s) if you guys agree.

@msporny @dlongley @pchampin

aidhog commented 3 years ago

For what it is worth, I agree that hashing is better than identification. Identification sounds like assigning an IRI perhaps. Hashing gets the idea across better. (Maybe "canonical hashing"? Perhaps too "wordy".)

dlongley commented 3 years ago

"Content-based identifier" would also work. Or "Content-based identifier (e.g., cryptographic hash)".

iherman commented 3 years ago

"Content-based identifier" would also work. Or "Content-based identifier (e.g., cryptographic hash)".

I believe that term would also be too cryptic (sic!) for many...

iherman commented 3 years ago

7 merged