w3c / tdm-reservation-protocol

Repository of the Text and Data Mining Reservation Protocol Community Group
https://www.w3.org/community/tdmrep/
Other
7 stars 8 forks source link

TDM Reservation for EPUB files #33

Closed llemeurfr closed 4 months ago

llemeurfr commented 9 months ago

An adaptation of our TDMRep protocol to EPUB files has been requested by several participants to the CG, especially Börsenverein des Deutschen Buchhandels (Germany), Elsevier (Germany), Gallimard (France), Eden Livres (France), or Penguin Random House (USA).

I proposed a solution during the last W3C Publishing Maintenance WG, which deals with evolutions of EPUB 3. This solution is now drafted as an evolution of our TDMRep specification.

Please read https://w3c.github.io/tdm-reservation-protocol/spec/tdmrep-epub.html#sec-epub and advise.

dazrand commented 9 months ago

Since the addition of any TDM to EPUB requires a change, should the inclusion be focused on a entry rather than new elements in the block? Perhaps something like:

<meta property="tdm:reservation">1</meta>
<meta property="tdm:policy">https://provider.com/policies/policy.json</meta>

This would fit better with existing metadata items and provide a familiar way for delivery platforms to expose it through their systems.

llemeurfr commented 9 months ago

I agree with you @dazrand, using the generic meta element is more in phase with the extension mechanism planned by the EPUB designers, and avoids having to define an XML namespace.

defining prefixed attribute values in XML is not really useful ("tdm" is not an XML namespace prefix in this case), therefore I would tend to keep an hyphen as separator.

<meta property="tdm-reservation">1</meta>
<meta property="tdm-policy">https://provider.com/policies/policy.json</meta>
llemeurfr commented 9 months ago

I made the modification in the draft document (url in the first comment). @dazrand please advise if it is 100% good for you.

dazrand commented 8 months ago

With the hyphen value the property will have to be modified so the new values are recognised during validation. Adding a prefix value to support the 'tdm:' is easy enough and nothing in the current standard blocks it. Would this allow the integration without having to adjust specifications with the maintenance group?

llemeurfr commented 8 months ago

You're right, @dazrand. The EPUB 3.3 spec allows metadata extensions in the Package Document using prefixed properties only. For those looking for references, see "EPUB 3.3 Vocabulary Association Mechanisms".

In the .opf, we need to define a prefix first

<package version="3.0" unique-identifier="uid" prefix="tdm: http://www.w3.org/ns/tdmrep#" xml:lang="en-US" xmlns="http://www.idpf.org/2007/opf">

and then use it like

 <meta property="tdm:reservation">1</meta>
 <meta property="tdm:policy">https://provider.com/policies/policy.json</meta>
dazrand commented 8 months ago

Yes, looks good to me, it even passes validation on EPUBCHECK, so should no impact current workflows too much.

iherman commented 8 months ago

@llemeurfr I am not sure I understand where we are now. The example in https://w3c.github.io/tdm-reservation-protocol/spec/tdmrep-epub.html#sec-epub now says:

<package xmlns:tdm="http://www.w3.org/ns/tdmrep/xml" ...>
  <metadata ...>
    <dc:title>Document title</dc:title>
    <meta property="tdm-reservation">1</meta>
    <meta property="tdm-policy">https://provider.com/policies/policy.json</meta>
  </metadata>
</package>

That does not seem to be correct; it should use tdm:tdm-reservation or tdm:reservation, depending on how the namespace document looks like. But not the way it is in the spec now...

Do I miss something?

dazrand commented 8 months ago

We would avoid using an xml name space and use the prefix for the TDM entries since that should not require any extra changes to the EPUB spec. So it would look like this:

<package prefix="tdm: http://www.w3.org/ns/tdmrep/xml" ...>
  <metadata ...>
      <dc:title>Document title</dc:title>
      <meta property="tdm:reservation">1</meta>
      <meta property="tdm:policy">https://provider.com/policies/policy.json</meta>
  </metadata>
</package>`
iherman commented 8 months ago

@dazrand I agree with you. The point is that the spec should be updated...

iherman commented 8 months ago

I have just submitted a PR (#35) to make the change.

tony-martin-aste commented 6 months ago

Have there been discussions about supporting TDMRep in the older Epub 2 format?

Publishers may have many existing Epub 2 files in stores. Adding one or two new TDMRep meta elements to those ebooks would be simple, but having to convert those ebooks to Epub 3 first would require more work.

Details regarding the meta element in Epub 2 versus Epub 3

Epub 2 meta

https://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2

One or more optional instances of a meta element, analogous to the XHTML 1.1 meta element but applicable to the publication as a whole, may be placed within the metadata element

For example:

<meta name="foo" content="bar" />

Epub 3 meta

https://idpf.org/epub/30/spec/epub30-changes.html#sec-deprecations-meta201 https://idpf.org/epub/30/spec/epub30-publications.html#sec-meta-elem

Each meta element defines a metadata expression, where the property attribute defines the statement being made in the expression and the text content of the element represents the assertion.

For example:

<meta property="foo">bar</meta>

Note: Adding <meta property="tdm:reservation">1</meta> to an Epub 2 file gives errors from the Epubcheck validator program, since Epub 2 doesn't support this newer kind of meta element.

Epub 2 -style meta in Epub 3

Apparently the Epub 2 -style element <meta name="foo" content="bar" /> is still allowed in Epub 3:

https://idpf.org/epub/30/spec/epub30-publications.html#sec-opf-meta-elem

The meta element defined in [OPF2] has been obsoleted and replaced by the new meta element, but may be included as an optional repeatable child of the metadata element for forwards compatibility purposes.

EPUB 3 Reading Systems must ignore this element.

llemeurfr commented 6 months ago

Hi Tony,

No, there has been no discussion about EPUB 2 in our Community Group. I suppose that this is because EPUB 2 is deprecated for a long time as a standard, and existing EPUB 2 files are not updated anymore by their publishers. I know that there are still EPUB 2 files produced in certain countries, usually because some ebook vendors are said to be refusing EPUB 3 (which in some situations is simply wrong). For instance this anomaly will very soon be solved in Spain.

The extensibility point is indeed different in EPUB 3 vs EPUB 2. In our specification, we must make it clear that the only EPUB 3 files are taken into consideration so far. I'll prepare this update.

I let publishers in this group discuss if we should or not accomodate EPUB 2 in TDMRep.

Best regards Laurent

Le 11 mars 2024 à 14:53, Tony Martin (Aste) @.***> a écrit :

Have there been discussions about supporting TDMRep in the older Epub 2 format?

Publishers may have many existing Epub 2 files in stores. Adding one or two new TDMRep meta elements to those ebooks would be simple, but having to convert those ebooks to Epub 3 first would require more work.

Details regarding the meta element in Epub 2 versus Epub 3

Epub 2 meta

https://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2

One or more optional instances of a meta element, analogous to the XHTML 1.1 meta element but applicable to the publication as a whole, may be placed within the metadata element

For example:

Epub 3 meta

https://idpf.org/epub/30/spec/epub30-changes.html#sec-deprecations-meta201 https://idpf.org/epub/30/spec/epub30-publications.html#sec-meta-elem

Each meta element defines a metadata expression, where the property attribute defines the statement being made in the expression and the text content of the element represents the assertion.

For example:

bar Note: Adding 1 to an Epub 2 file gives errors from the Epubcheck validator program, since Epub 2 doesn't support this newer kind of meta element.

Epub 2 -style meta in Epub 3

Apparently the Epub 2 -style element is still allowed in Epub 3:

https://idpf.org/epub/30/spec/epub30-publications.html#sec-opf-meta-elem

The meta element defined in [OPF2] https://idpf.org/epub/30/spec/epub30-publications.html#refOPF2 has been obsoleted and replaced by the new meta https://idpf.org/epub/30/spec/epub30-publications.html#elemdef-meta element, but may be included as an optional repeatable child of the metadata element for forwards compatibility purposes.

EPUB 3 Reading Systems must ignore this element.

— Reply to this email directly, view it on GitHub https://github.com/w3c/tdm-reservation-protocol/issues/33#issuecomment-1988630891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSAHUUVTOXKIFAWJGIQBFTYXXANNAVCNFSM6AAAAAA73E5FDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAOBZGE. You are receiving this because you were mentioned.

dazrand commented 6 months ago

I agree with Laurent, EPUB2 is deprecated and should be allowed to die off. It offers limited functionality and only prolongs the distribution chain’s support for it, often at the cost of supporting EPUB3.

I understand the need to keep costs minimal while having the desire to protect your content. However, a solution to collect EPUB files from an archive, open them, insert the markers, then repackage them, is already generating some cost in its development. Best to focus on upgrading the content to the current standard in order to utilize the features it supports.

llemeurfr commented 4 months ago

Added in Version 2 for EPUB 3, version 3 for EPUB 2.