w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

The canonical-ness of identification needs clarification #56

Closed BigBlueHat closed 7 years ago

BigBlueHat commented 7 years ago

The current spec text states:

If assigned, this canonical identifier MUST be unique to the Web Publication.

Given that definition, I asked in this comment about what the "canonical identifier" would be for https://www.w3.org/TR/html/

Here's the list of identifiers in the the currently published TR for HTML:

https://www.w3.org/TR/2016/REC-html51-20161101/ 
https://www.w3.org/TR/html51/ 
https://www.w3.org/TR/html/ 
https://w3c.github.io/html/

To my original comment, @iherman responded that:

The W3C considers https://www.w3.org/TR/html/ as THE identifier for the HTML standard

Which is correct if "Web Publication" in our current definition above refers to the the conceptual thing called "the HTML standard."

However, if "Web Publication" refers to the thing that is currently published (i.e. the publication of "the HTML standard" on the web at the time of my retrieving it), then the canonical identifier would be https://www.w3.org/TR/2016/REC-html51-20161101/ which uniquely references the Web Publication I have in my browser right now.

The note about rel="canonical" doesn't really clear things up (sadly). RFC 6596 defines that link relationship as...

Designat[ing] the preferred version of a resource (the IRI and its contents).

Given that this is an author/publisher defined relationship, the "preferred version" is likely the latest one. But if that's the case, then what is the unique, authoritative identifier for the resource that I just got in my browser? Do we have a name for that yet (assuming that's not "canonical" in the current parlance)?

tcole3 commented 7 years ago

For me, canonical is not synonymous with fixed, so I'll disagree and say that canonical is in the eye of the beholder, and for a Web Publication, the beholder who matters is the Publisher. I think it is within scope and intent of our current definition of canonical that the W3C can decide that the canonical identifier is https://www.w3.org/TR/html/ because at any given point in time this URL identifier designates and will resolve (albeit after redirection) to the preferred version of the resource.

For other Web Publications a Publisher may make a different choice. For example, for publication use cases where fixity is considered critical, a Publisher can decide to make a different canonical identifier for each new version.

To me this is somewhat (maybe imperfectly) analogous to the Resource vs. Representation issue that comes up all the time in discussions of RESTful. It also is related to the note distinguishing between locators and identifiers in our Vision document - https://www.w3.org/TR/pwp/#h-note1 which considers different formats of a Web Publication rather than version changes over time, but still has relevance, I think. (Note, there are use cases where the difference between PDF and HTML versions of a publication could be important.)

As for what you have in your browser, it has a URL locator and is a version or representation of a Web Publication (or part of a Web Publication) at a given point in time, but if it contains a link with rel="canonical" that points somewhere else then it is not The Web Publication in capital letters for all time and the UA should preserve this other URL as the canonical link, not the URL showing in your address bar. What's in your Web browser is ultimately only guaranteed to be a component (primary resource, could be the only primary resource) of a representation of a Web Publication as disseminated to your browser at a particular point in time. I'm not sure it needs a more descriptive name than representation (and offhand I don't have another idea for what to call it).

BigBlueHat commented 7 years ago

Thanks for the additions, @tcole3.

This bit is the key thing that concerns me:

For other Web Publications a Publisher may make a different choice.

Given that neither our definition nor the definition of rel="canonical" clarifies whether it's a reference to a "latest version" or to a "point-in-time version" I'm not sure what concept or Thing we're providing uniqueness to...and for what reason.

Guess I'm missing here (as many other places) what it is we're "affording" (see #52) by defining this the way we have currently.

If MUST be unique, what is its vector of uniqueness? and to what end?

BigBlueHat commented 7 years ago

Actually, the definition we used for Web Annotation Data Model's canonical property seems much clearer as to what it affords anything consuming an annotation that contains it:

The relationship between an Annotation, Body or Target and the IRI that SHOULD be used to track its identity, regardless of where it is made accessible. If this property is set, then systems MUST NOT change or delete it. Systems SHOULD NOT assign a canonical IRI without prior agreement if one is not present, as the Annotation could already have a canonical IRI elsewhere.

Affords global uniqueness during distribution, (re)publication, etc. Or that was our intent. :smiley:

That seems like a very different affordance than what the identifier https://www.w3.org/TR/html/ provides. Despite their both being "canonical." See? Clear as mud... ๐Ÿ˜•

iherman commented 7 years ago

@BigBlueHat the comparison with the WA Model is a bit misleading. That is defining a canonical ID for a very restricted target defined in the WA standard. However, the reality is that there is no way we can define any stricter meaning to canonical ID-s in the realm of Web Publications, in view of the diversity of publishers (in the general sense), different communities, etc. Book publishers, journal publishers, document writers, W3C, IETF, etc, will always have different views. Scholarly publishers have a much stricter view (reflected by the rules around DOI-s) than book publishers (witness the mess with ISBN-s), etc.

So yes, it is a bit messy out there, but we are not in position to clean this mess up. I would think, as far as this WG is concerned, what we have is the maximum (at least in spirit).

BigBlueHat commented 7 years ago

@iherman it certainly is a mess out there. ๐Ÿ˜ธ However, we're not going to bring any clarity to the world (publishing or not) if we reuse the phrase "canonical" in an essentially unspecified way.

Our current definition of "canonical identifier" merely states that it be "unique" not upon which vector that uniqueness matters.

Is it unique on the Web? Is it a unique moment in the lifetime of the publication? Is it unique to that publication regardless of it's life-cycle?

The rel="canonical" definition is "clearer" in that it merely says that it's the "preferred" (by the author/publisher) identifier. It can help declare a preference between identical ways to access the same resource such as http://book.example and http://www.book.example/index.html. If that's what we're wanting here, let's declare that.

Let's sort out what our "canonical identifier" affords the publication (throughout it's life-cycle and during the reader's experience), and then spec that...and then move on to the other likely-to-be-needed life-cycle related identifiers.

BigBlueHat commented 7 years ago

Also, the thoughts in this thread are worth a re-read: https://github.com/w3c/dpub-pwp-loc/issues/28

BigBlueHat commented 7 years ago

Just FYI, @mattgarrish committed some clearer text this morning:

A Web Publication's canonical identifier is an identifier that designates and resolves to the preferred version of the Web Publication. The canonical identifier SHOULD be an address, but, if not, it MUST be possible to make a one-to-one mapping to an address (e.g., a DOI can be resolved to a URL via a DOI resolver).

rdeltour commented 7 years ago

we're not going to bring any clarity to the world (publishing or not) if we reuse the phrase "canonical" in an essentially unspecified way.

Our current definition of "canonical identifier" merely states that it be "unique" not upon which vector that uniqueness matters.

I fully agree with @BigBlueHat's concerns. That's one of the reasons why I've been lukewarm about saying anything specific about "identifiers", apart from providing a way to declare "some" identifier (e.g. with a dc:identifier property) and allowing existing links relations (rel=canonical).

The real nature of identifiers will depend on social contracts and practices that can't be established by the spec, IMO.

iherman commented 7 years ago

@bigbluehat:

Our current definition of "canonical identifier" merely states that it be "unique" not upon which vector that uniqueness matters.

Is it unique on the Web? Is it a unique moment in the lifetime of the publication? Is it unique to that publication regardless of it's life-cycle?

Well, I agree with what @rdeltour writes:

The real nature of identifiers will depend on social contracts and practices that can't be established by the spec, IMO.

Meaning, for me: it is unique under whatever regime a specific community and, in practice, the editor/publisher decides it should be unique. Ie, the exact notion of uniqueness will indeed depend on a social contract, and the maximum we can do is to secure a placeholder in our information set to essentially store that.

I agree with you that it is worth adding an informative note into the spec along those lines; I would trust @mattgarrish to find the right words. But I do not believe we should say anything more normative in the document than what we already do.

mattgarrish commented 7 years ago

Just FYI, @mattgarrish committed some clearer text this morning

That was taken from Tim's earlier response, as I found it very helpful in terms of explaining the purpose of a canonical identifier.

I don't know how we ever guarantee uniqueness, and certainly can't test for it, so I was waiting to see where this discussion goes before doing any other tweaking.

Since I tend to go back to epub when thinking of these things, though, we didn't try for normative uniqueness:

The Author is responsible for including a primary identifier in the Package Document metadata that is unique to one and only one EPUB Publication.

Would wording like that work here?

"When assigned, the canonical identifier needs to be unique to one and only one Web Publication."

iherman commented 7 years ago

@mattgarrish

Would wording like that work here? "When assigned, the canonical identifier needs to be unique to one and only one Web Publication."

I do not think this is very different insofar as the issues raised by @BigBlueHat ...

mattgarrish commented 7 years ago

Sorry, missed your response while typing, Ivan.

What about:

"When assigned, the canonical identifier needs to be unique to one and only one Web Publication. Ensuring uniqueness is outside the scope of this specification, however. It will be influenced by the conventions of the identifier scheme used and the degree of control over assignment."

(Someone can probably offer something better for that last sentence, though.)

iherman commented 7 years ago

What about:

"When assigned, the canonical identifier needs to be unique to one and only one Web Publication. Ensuring uniqueness is outside the scope of this specification, however. It will be influenced by the conventions of the identifier scheme used and the degree of control over assignment."

That works for me. Thanks!

BigBlueHat commented 7 years ago

It is getting clearer @mattgarrish, thanks. ๐Ÿ˜ƒ

However, @iherman I don't think we can completely "punt" on life-cycle related identification.

Given the most recent definition above, this would be (afaict) the HTML specifications Web Publication canonical identifier: https://www.w3.org/TR/html/

However, if I've offlined that and HTML6 comes out, what does that identifier afford me and my User Agent? It would (I'd expect) allow my UA to notify me of the updates and/or auto-fetch those for me (perhaps based on settings). But what if I want/need to keep a copy of the version I have (at this writing HTML 5.1)?

I can file this point as a separate issue, but I do think we'll have some system for life-cycle related identifiers, or how will my UA provide me with my available options--which currently include getting "final" copies of HTML4, intermediate versions of HTML5, and even the not-yet-technically-published HTML5.3 Editor's Draft.

My concern is that we spec this the wrong way 'round we'll prevent scenarios like a reader "keeping" copies of each of those renditions of the (higher order conceptual publication know as the) "HTML spec."

iherman commented 7 years ago

@BigBlueHat,

I do not understand this:

My concern is that we spec this the wrong way 'round we'll prevent scenarios like a reader "keeping" copies of each of those renditions of the (higher order conceptual publication know as the) "HTML spec."

And I also do not see what normative mechanism or statement we could put into the spec that would be testable, enforceable, and, mostly, would not clash with the policy of at least of our constituencies... Can you give an example of what you would want to see?

BigBlueHat commented 7 years ago

In this current Web, I can visit https://www.w3.org/TR/html/ and (because I read English) browse to previous versions of HTML and to the existing HTML5.3 Editors Draft. Each of those renditions (including the representation of HTML5.1 each have identifiers.

Here's the breakdown in the HTML spec:

This version:
https://www.w3.org/TR/2016/REC-html51-20161101/
Latest published version:
https://www.w3.org/TR/html51/
Latest version of HTML:
https://www.w3.org/TR/html/
Editor's Draft:
https://w3c.github.io/html/
Previous Versions:
https://www.w3.org/TR/2016/PR-html51-20160915/

Here's an "armchair" spec for giving some amount of affordances to a User Agent to give me, the reader, the opportunity to "keep" the copy of the HTML spec (or specs!) that I want/need.

The UA has just loaded the current representation of https://www.w3.org/TR/html/ and sees (somehow):

self: https://www.w3.org/TR/2016/REC-html51-20161101/
latest-version: https://www.w3.org/TR/html/
working-copy: https://w3c.github.io/html/
predecessor-version: https://www.w3.org/TR/2016/PR-html51-20160915/

The naming is from RFC 5829 (which are also registered link relationships).

WP could afford my UA the opportunity to differentiate these renditions for keeping each offline (as separate things), re-anchoring annotations (both specifically to a rendition or across all the "HTML specs" where possible), tracking my reading state on each or across all of them, or searching over all of them collectively.

Alternatively, one might treat each rendition as a completely separate publication (each with it's own "canonical identifier") which based on the most recent definition would be (in that case) https://www.w3.org/TR/2016/REC-html51-20161101/.

Here's the definition again:

When assigned, the canonical identifier needs to be unique to one and only one Web Publication. Ensuring uniqueness is outside the scope of this specification, however. It will be influenced by the conventions of the identifier scheme used and the degree of control over assignment.

If the publisher were to treat each rendition as a distinct Web Publication, then the scenario is confused the other way around--what is https://www.w3.org/TR/html/ then and why have it? Is it also a distinct Web Publication?

If we choose that WP does not itself provide an authoritative method for life-cycle identification, then we're going to need to be extra super-duper clear what we mean specifically (heh) by "canonical identifier" and "Web Publication" (even) in the context of life-cycles. Clear as mud? ๐Ÿ˜•

iherman commented 7 years ago

Yes, it is about as clear as mud :-)

But I am afraid we're diving into a relatively unique situation of a living document that also has a bunch of version and versions of versions. That one is indeed messy. But does this represent the majority? Is this part of the 80% in an 80%/20% cut? I do not think so. Documents, Web Publications more specifically, are and will be more stable than that.

Which does not mean that we can control exactly what is happening. I have heard about dramas surrounding ISBN-s, where the same ISBN-s have been reissued to completely different books (although that my be and urban legend). But I am sure that the organization keeping up ISBN-s have their own rules on when an identifier is considered to be unique. This should be clear for whoever deals with ISBN-s and books. DOI-s are much more stringent: once published with a DOI, a scholarly paper is supposed to be cast in concrete for eternity. Again, scholarly publishers know that. And there are other examples.

What I am getting to again: I just do not believe we can set up any rules that would cover these different cases. If we came up with some sort of rules that could cover the (very convoluted!) W3C situation, it would be very complicated, so complicated that no publisher would implement them. As I said:

And I also do not see what normative mechanism or statement we could put into the spec that would be testable, enforceable, and, mostly, would not clash with the policy of at least of our constituencies... Can you give an example of what you would want to see?

BigBlueHat commented 7 years ago

@iherman like you I have zero interest in defining rules for publishing processes. However, I do have an interest in (and this group has a need to) define vectors of identification relative to change.

What are the effects on the user's experience when a change is made?

baldurbjarnason commented 7 years ago

@iherman like you I have zero interest in defining rules for publishing processes. However, I do have an interest in (and this group has a need to) define vectors of identification relative to change.

What are the effects on the user's experience when a change is made?

"Defining vectors of identification relative to change" is getting into the territory of speccing version control on the web from first principles. That is a huge, huge task that has been attempted several times before. The end-solution would probably just look like Memento anyway and be about as widely supported.

I don't think it's reasonable to expect this to be solved by neither this working group nor in a web publication-specific standard. Since there are pre-existing solutions that nobody's using and we're very unlikely to come up with something better, we have to, IMO, punt on solving the problem of life-cycle related identification.

mattgarrish commented 7 years ago

What are the effects on the user's experience when a change is made?

But we can't solve this via the identifier alone, can we? We haven't said that changes to a web publication necessarily result in a newly-identifiable publication, and I don't think we can put such a requirement on the web. The publisher has to make that determination.

When I look at the above, for example, I see four publications each of which has its own likelihood of not being the same the next time I open it (assuming the reading system regularly checks for changed resources):

The user has to determine which of these they want, so has to understand W3C's identification structure of what they're accessing. I don't know that the naming is understood broadly, but each does have a possible unique identifier in its URL.

rdeltour commented 7 years ago

I'm still failing to see why we need a "canonical identifier" in the first place, even a loosely-defined one as proposed by @iherman and prosed by @mattgarrish.

It seems to me that at the minimum we will have:

That some systems use this or that as a "canonical" identifier, or require some uniqueness in some specific context, is totally up to implementers. In other words, I don't think the spec even needs to include the terms "canonical identifier". At all.

(Maybe I'm missing something obvious, I'm always happy to be proven wrong ;-).

BigBlueHat commented 7 years ago

@mattgarrish you nailed it. ๐Ÿ˜„

The user has to determine which of these they want, so has to understand W3C's identification structure of what they're accessing. I don't know that the naming is understood broadly, but each does have a possible unique identifier in its URL.

Specifically, you noted that the "user has to determine which of these they want." The publications certainly should not all share a singular, overlapping "canonical identifier" in that case. They would, however, greatly benefit by some method of identifying related renditions--which would in turn afford the user/reader the ability to browse/access/keep these other related renditions.

My primary concern with this issue is that as defined it's quite likely that all the renditions would be unwisely given a singular canonical identifier. The consequence of which being that there is really only one, canonically identified Web Publication and the others would all be "overwritten" (given the progress of time).

mattgarrish commented 7 years ago

I'm still failing to see why we need a "canonical identifier" in the first place

It's not required, for what it's worth.

But I think there's value in the permanence of a canonical identifier as a complement to an address, which could change over time.

A non-resolvable identifier string isn't terribly useful, which is what elevates this in my mind.

rdeltour commented 7 years ago

I think there's value in the permanence of a canonical identifier as a complement to an address, which could change over time.

I don't disagree, but the rules and principles governing the use of such a canonical identifier can be at best loosely described in an informative note, I don't think the spec should strive to deal with that.

mattgarrish commented 7 years ago

the use of such a canonical identifier can be at best loosely described in an informative note, I don't think the spec should strive to deal with that

I agree to the extent that we shouldn't get too bogged down in the minutiae of this. Whether it does or doesn't belong in the information set I'm not passionate about, but I lean to being explicit about properties that have identifiable value for processing.

A canonical identifier can provide a check on an address that goes missing. But unlike the address, it may require resolving so is more complex to handle. That's why I'm okay with it being a should to an address' must. Plus, it's also information specifically useful to the user agent to act upon, or at least more so than to a human.

I'd also tend to err on the side of leaving this in for FPWD review. Positive or negative feedback will help shape its future, but we won't get that if we take it out prematurely. It was identified in the DPIG work as important, so for that alone I'd just let it be for now. We can always add an issue pointer to this thread.

rdeltour commented 7 years ago

I'd also tend to err on the side of leaving this in for FPWD review. Positive or negative feedback will help shape its future, but we won't get that if we take it out prematurely. It was identified in the DPIG work as important, so for that alone I'd just let it be for now. We can always add an issue pointer to this thread.

sure, works for me!

mattgarrish commented 7 years ago

Okay, I'll add a link to this issue to the section.

I also worry we may take for granted the ability of people not involved in these discussions to understand the distinctions we're making here. Any thoughts on adding an additional paragraph like this:

"The canonical identifier differs from the mandatory address in its permanence. A Web Publication's address could change, for example, but the canonical identifier is expected to still provide a way of locating the new location (e.g., a DOI registry could be updated with the new URL)."

BigBlueHat commented 7 years ago

@mattgarrish given that additional paragraph, I'm guessing that https://www.w3.org/TR/html/ would be the "canonical identifier" (as currently defined in this thread) for any and all HTML spec publications?

WSchindler commented 7 years ago

We distinguished the address from the canonical identifier. While the address always has to be a locator, i.e. a URL to retrieve the WP, the canonical identifier could be any permanent GUID such as an ISBN which IMO can't be used to locate a resource physically. We would need a stable identifier for libraries/bibliographies or scientific work (citations from a WP), but it could refer to a resource that is only available in print.

mattgarrish commented 7 years ago

I'm guessing that https://www.w3.org/TR/html/ would be the "canonical identifier" (as currently defined in this thread) for any and all HTML spec publications

Not any and all, only for HTML the Ever-Living (to steal from thundercats).

An individual version usually has a general TR address and a specific dated file name version, so those would be more likely to be the address and canonical identifier. For example, looking at 5.1, you might end up with something like:

address: https://www.w3.org/TR/2016/REC-html51-20161101/ canonical identifier: https://www.w3.org/TR/html51/

BigBlueHat commented 7 years ago

@WSchindler that use case I understand and it maps to "canonical" much more clearly.

@mattgarrish the "you might end up with" bits are what concerns me. Do we have any affordance expectations relative to the canonical identifier (at least when it's also an address)?

In the example you gave you mapped "address" to https://www.w3.org/TR/2016/REC-html51-20161101/ but given that I can also use https://www.w3.org/TR/html51/ and (for now) https://www.w3.org/TR/html/ to get to the same publication, I actually have 3 potential identifier/addresses.

The canonical-ness of any these three feels terribly subjective and without any known affordance provided from picking one, I'm not sure what would be gained by declaring one at all.

Does our canonical identifier afford disambiguation across multiple Web Publications? (i.e. "these are all HTML5.1 specs" or "these are all HTML specs")

Does it afford "latest-ness"? (i.e. "the canonical identifier is expected to still provide a way of locating the new location")

mattgarrish commented 7 years ago

the "you might end up with" bits are what concerns me.

The problem is we're not discussing a system that was designed with web publications in mind. I don't know if w3c would set up manifest linking the same way, that's why everything we do here is speculation.

I agree this is a messy case, but with some different architecting it might not be as bad as it seems now (speculation on my part, not having tried to think out what that would be).

Not to dismiss the challenges, which I agree are real, but this particular problem of an expansive set of potentially canonical links for a document that is both alive (ephemerally, since there is no actual unversioned HTML spec.) and versioned is not something we can solve. The responsibility also falls on the publisher of this kind of content.

mattgarrish commented 7 years ago

(ephemerally, since there is no actual unversioned HTML spec.)

In W3C, in case that needs saying. TR/html isn't a unique document separate from the latest version.

dauwhe commented 7 years ago

So we've been experimenting with various more "webby" web publications. One example is The Evolution of Trust.

Say I have a copy of this on my github (which is actually true). Would this then make sense, with the "preferred" URL being the (canonical) identifier, and the current URL being the address?

    "identifier": "http://ncase.me/trust/",
    "address": "https://dauwhe.github.io/trust-gh-pages/",
iherman commented 7 years ago

@dauwhe : yes :-)

BigBlueHat commented 7 years ago

this particular problem of an expansive set of potentially canonical links for a document that is both alive (ephemerally, since there is no actual unversioned HTML spec.)

Then what is a canonical link/identifier/address then?

and versioned is not something we can solve.

Never asked that we solve "versioning" just that we afford UA's the information to understand basic relationships between various renditions. Ideally it's nothing more than what you see at the top of every HTML spec--links pointing to whatever's next, whatever came prior, and (depending on one's definitions) a "self" and/or "canonical" link/identifier/address.

Without that sort of clarity around what we're enabling by this definition/feature/thing, I fear our rather unique and somewhat vague definitions are only confusing the situation rather than enabling any value.

Regardless, I'm happy to re-file issues against a FPWD if I'm the lone voice on this one...

Thanks for listening and considering!

mattgarrish commented 7 years ago

Then what is a canonical link/identifier/address then?

I just don't think /TR/html is the most useful case to consider. The better one is the specific versions of HTML, which have addresses and canonical identifiers that do fit with the WP interpretation. TR/html is the "latest" link (pan-publication), and in my mind that's a completely different beast. You can make a web publication of it, but we're stretching out what is a publication. It's a publication that becomes other publications, in effect. A chameleon, of sorts.

At least as I understand our definition, the canonical identifier tells you that three publications from three different locations all represent the same thing, despite their having different addresses. Maybe you're working on it in github and publishing under your own banner periodically, but that doesn't matter because the identifier tells you the two are equivalent. It also tells you despite whatever different addresses the publication is hosted under, there is one that is preferred and that can be resolved and has some permanence to it (again, permanence being un-specable)..

tcole3 commented 7 years ago

More precisely in this case it tells us that the 3 Web Resources at different URLs are all versions of /TR/html. But I don't think this invalidates thinking of /TR/html as a Web Publication. To know which version is considered by the publisher to currently be the 'preferred version' you need to de-reference /TR/html, i.e., de-reference the canonical identifier. The different versions are not equivalent, but do have a relationship, they essentially comprise a set or at least an aggregate - one is the currently preferred version of the set in the eyes of the Publisher and the other 2 are non-preferred versions (i.e., superseded, or updates still in draft, etc.) There are use cases when knowing which item of a set you want to treat as a publication is preferred at this moment if you have to pick only one is desirable. The question you're raising is whether an aggregate of this sort can be considered a Web Publication. I think it can.

Of course, there are use cases when it would be desirable to distinguish and identify each of the other non-prefered versions as well. This is part of the rationale for Herbert's and Michael's proposal to IETF for a link rel='identifier' . Something not the same as the canonical identifier, but rather more suitable for your use cases? I still think canonical identifier is useful for the use cases along the lines of what Dave, me and others have suggested, i.e., the identifier for the preferred item of what in this instance amounts to a Web Publication having multiple versions.

I'm not sure the IETF proposal will go any place, but I wonder if it did would this help any with your concerns?

Trying to decide whether /TR/html can be a Web Publication runs the risk of getting us into a library domain FRBR discussion which I don't think we want. Suffice to say that most librarians familiar with FRBR could make a strong case for /TR/html being a Web Publication.

mattgarrish commented 7 years ago

Trying to encapsulate everything I've gathered from this thread, how does the following explanation strike people (independent of whether you agree with this property):

A Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication. The canonical identifier SHOULD be an address, but, if not, it MUST be possible to make a one-to-one mapping to an address (e.g., a DOI can be resolved to a URL via a DOI resolver).

If a Web Publication is hosted at more than one address, this identifier allows a user agent to identify the shared relationship between the versions and to determine which of the available addresses is primary.

The canonical identifier is also intended to provide a measure of permanence above and beyond the Web Publication's address. Even if a Web Publication is permanently relocated to a new address, for example, the canonical identifier will provide a way of locating the new location (e.g., a DOI registry could be updated with the new URL, or a redirect could be added to the URL of the canonical identifier).

When assigned, the canonical identifier needs to be unique to one and only one Web Publication, independent of its address(es). Ensuring uniqueness is outside the scope of this specification, however. The actual uniqueness achievable depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.

BigBlueHat commented 7 years ago

Excellent work @mattgarrish! That description clears up many questions and points to the things this bit of data might afford a consuming User Agent while still leaving things flexible for idiosyncratic determinations of "actual uniqueness."

Impressive. Most impressive. ๐Ÿ˜

mattgarrish commented 7 years ago

And thanks for continuing to push on this, @BigBlueHat !

Confusion within the group is never a good sign for adoption.

iherman commented 7 years ago

@BigBlueHat : the latest version of the editor's draft includes this change. Is it o.k. to close this issue? (I think it is, but you raised it:-)