w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

What does modification date mean? #73

Closed lrosenthol closed 5 years ago

lrosenthol commented 7 years ago

If we are going to have a modification date, especially one that is a "SHOULD", we need to be extremely clear what it means.

llemeurfr commented 7 years ago

In a previous standard I worked on (NewsML-G2) the IPTC defined two different metadata properties: versionCreated as a timestamp of the current version of the "news item" (including metadata) and contentModified as a timestamp of the last edition of the news content.

Considering WP and PWP as 2 variants of an interchange format, I would recommend that the modification date applies to the "item", i.e. the last time the (content + metadata) were updated on the publishing site.

BillKasdorf commented 7 years ago

+1, and thanks for bringing up IPTC NewsML-G2!

For those unfamiliar with IPTC, this is the technical standards org for the news industry. It’s very international; members include organizations like the New York Times, Agence France-Presse, Tass, the Associated Press, Bloomberg, Thomson Reuters, the BBC, Getty Images, and many others.

Just can’t resist pointing out that that’s publishing and definitely in the scope of Publishing@W3C.

--Bill K

Bill Kasdorf

VP and Principal Consultant | Apex CoVantage

p:

734-904-6252 m: 734-904-6252

ISNI: http://isni.org/isni/0000000116490786 ORCiD: https://orcid.org/0000-0001-7002-4786https://orcid.org/0000-0001-7002-4786?lang=en

From: L. Le Meur [mailto:notifications@github.com] Sent: Wednesday, October 04, 2017 9:22 AM To: w3c/wpub Cc: Subscribed Subject: Re: [w3c/wpub] What does modification date mean? (#73)

In a previous standard I worked on (NewsML-G2) the IPTC defined two different metadata properties: versionCreated as a timestamp of the current version of the "news item" (including metadata) and contentModified as a timestamp of the last edition of the news content.

Considering WP and PWP as 2 variants of an interchange format, I would recommend that the modification date applies to the "item", i.e. the last time the (content + metadata) were updated on the publishing site.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/w3c/wpub/issues/73#issuecomment-334153099, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIYxNbvNcQhhEpVapVxSTp_ODToWaR7yks5so4aHgaJpZM4Pi--e.

lrosenthol commented 7 years ago

On Wed, Oct 4, 2017 at 10:20 AM, BillKasdorf notifications@github.com wrote:

+1, and thanks for bringing up IPTC NewsML-G2!

For those unfamiliar with IPTC, this is the technical standards org for the news industry. It’s very international; members include organizations like the New York Times, Agence France-Presse, Tass, the Associated Press, Bloomberg, Thomson Reuters, the BBC, Getty Images, and many others.

On the other hand, you have the "king" of content metadata standards, Dublin Core, which also has a modified property (< http://dublincore.org/documents/dcmi-terms/#terms-modified>) that we should strongly consider. And in that case, the metadata and the "resource" are separate things that have different dates.

FWIW - this debate on what is a "modification date" has happened many times in many places before - we need not create something new. Just pick one of the myriad of existing standards that we believe is the "best for us"...

atyposh commented 6 years ago

Let's check whether the mapping we (@iherman) recently made to schema.org includes an acceptable value for dcterms:modified.

GarthConboy commented 6 years ago

This is generally viewed as untrustworthy by Reading Systems, but there does not seem to be consensus re simply removing it from our basic infoset. Could well be used in various workflow.

RachelComerford commented 6 years ago

To add to @GarthConboy 's comment: This (modification date) solves a problem for publishers and vendors to publishers in providing a quick and easy way to check the "version" of the ebook that they are viewing.

Sample use case/personal experience: A student is using the Psychology 12e epub from bookshare. They complain to the publisher that section 3.2 is being read by their AT before section 3.1. The publisher opens the version in their CMS, sees the version is dated 12/1/17, then checks the version in Bookshare and it's dated 10/1/17.

atyposh commented 6 years ago

This metadata (dcterms:modified or equivalent) may be more useful for a packaged WP (i.e., when it was packaged). In the case of an unpackaged WP, it would be quite problematic for the manifest to assert that none of its constituent resources were modified after a particular date specified therein.

Alternative protocols (e.g., expiration headers, ETags, etc) are more appropriate for (unpackaged) WP, methinks.

baldurbjarnason commented 6 years ago

(TL;DR: modification date is editorial metadata, more akin to cover or authorship than to the more functional metadata such as the manifest or HTTP caching headers. Having it makes OPDS catalogues for web publications more reliable.)

Modification dates in the context of publications is a very different beast from the modification data transmitted by HTTP headers.

The headers are for caching and have to take into account a variety of assets, user session, and a bunch of ephemeral contextual situations that can vary a lot—even though the publication stays virtually the same.

Modification date in publication metadata is editorial and can't be derived in the same way that a caching header would.

The best analogy is the updated field in the Atom spec (emphasis mine):

The "atom:updated" element is a Date construct indicating the most recent instant in time when an entry or feed was modified in a way the publisher considers significant. Therefore, not all modifications necessarily result in a changed atom:updated value.

atomUpdated = element atom:updated { atomDateConstruct }

Publishers MAY change the value of this element over time.

This is very different from how you'd set most update-related HTTP headers.

This is a very important field for both packaged and unpackaged publications as without it syndication and distribution of both publications and notifications about said publications becomes harder.

If you don't include this in the infoset, in most cases publishers, authoring systems, distributors, etc. will have to create an updated/modified date out of band. And in those cases you won't be able to match the notification/syndication/federation/whatever back to the original publication.

While this isn't important for viewing or authoring a publication it is very useful for distribution both of the publication and information about the publication. (The atom spec even goes as far as to make atom:updated a required element in feeds).

This doesn't have to be included in the publication infoset. People are likely to use Atom, Activity Streams, JSON Feed, OPDS and the like for distribution. And those all have updated or modified fields.

But not having it in the publication infoset as well means that there's a bit of a disconnect between the distribution protocols and the publication and that can make referring back to the publication a bit less reliable.

Omitting a modified date from the infoset is a bit like not being able to add an ebook's ISBN to its metadata. Sure, the ISBN has no real bearing on reading the ebook but it's a very important piece of metadata from a sales and distribution perspective and can be quite useful for matching ONIX data back to the ebook file (or matching both back to a common database).

If we keep a modification date (which I'd prefer), my suggestion would be to define it in exactly the same way as the atom spec defines atom:updated. That way we have guaranteed compatibility with both feeds and OPDS (which is based on atom).

(Apologies for the long comment. I didn't have time to edit it down to a short one.)

atyposh commented 6 years ago

Thanks @baldurbjarnason!

It should be made clear that this metadata is completely editorial and therefore cannot be depended upon to determine whether content/resources have changed since the given date.

Certainly most content management systems (scholarly, trade, blog, etc) will have such a date readily available. But there are less formal use cases (e.g. simple ad hoc web pages promoted to WPs) for which such a date will be less obvious.

atyposh commented 6 years ago

The text in section 3.3.7 Last Modification Date of the draft seems to make it clear enough.

This date does not necessarily reflect all changes to the Web Publication (e.g., third-party content could change without the author being aware). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.

I'll reiterate the initial questions from @lrosenthol with answers derived from guidance above and await any last-chance feedback before closing.

Does it only apply to the content or also to metadata (such as the accessibility status)?

It applies to both content and metadata.

Does it apply to when a template (eg. a CMS) is changed but the content itself isn't changed?

This one seems tricky to me. When ScienceDirect (assuming it implements WP site-wide) relaunches with a fresh design (every few years) should all 12 million WPs get a new lastModified date? I just don't know.

Does it only apply to PWP and not to WP (and if so, should we wait to add it till we get there)?

It applies to WP and any implementation of PWP.

baldurbjarnason commented 6 years ago

Does it apply to when a template (eg. a CMS) is changed but the content itself isn't changed?

This one seems tricky to me. When ScienceDirect (assuming it implements WP site-wide) relaunches with a fresh design (every few years) should all 12 million WPs get a new lastModified date? I just don't know.

Well, if we're following the precedent set by atom and similar formats then this would be an authorial decision—a judgement call on the part of the author. I.e. to reuse the phrase from the atom spec: was it "modified in a way the publisher considers significant"?

My suggestion would be to—like the atom spec does—make it clear in the spec that this hinges on author/publisher judgment on what a meaningful change is in the context of their publication.

wareid commented 5 years ago

@mattgarrish Could you add clarification to https://w3c.github.io/wpub/#last-modification-date regarding this issue, then I can close.

iherman commented 5 years ago

Done in the reference above

iherman commented 5 years ago

This issue was discussed in a meeting.