metadata for series name and volume number

GoogleCodeExporter commented 9 years ago

Being able to indicate both the name of the series and the volume number in the 
metadata is critical for comics.

This would also be useful for genre fiction (SF, fantasy, crime novels) where 
series are predominant.

DublinCore doesn't seem to have such elements and I can tell from experience 
that ONIX won't work either.

This kind of metadata is every bit as important as the title for comics.

Original issue reported on code.google.com by hadrien....@feedbooks.com on 27 Mar 2013 at 3:03

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

This is also needed for magazines and journals. The PRISM/PSV group bumped up 
against this problem too, and it came up on a NISO web conference this week.

Having said that, I should point out that the title-type property "collection" 
was specifically intended to accommodate a series name (a series is a special 
case of a collection, namely an ordered collection) and we intended the 
group-position property on <meta> to provide a volume number. So I'm not sure 
this requires a change to the spec; maybe just add these examples to make it 
clear how these were intended.

Original comment by bkasd...@apexcovantage.com on 18 Apr 2013 at 9:38

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I've looked at the spec and there are no real examples. The only thing I could 
find seems very similar to ONIX (collection + group-position).

In that regard, ONIX is not useful at all. ONIX does a decent job at cutting a 
title into multiple elements and assigning them orders, but this is completely 
different than properly expressing the semantics for series information.

Original comment by hadrien....@feedbooks.com on 22 Apr 2013 at 9:46

GoogleCodeExporter commented 9 years ago

The section for dc:title 
(http://www.idpf.org/epub/30/spec/epub30-publications.html#sec-opf-dctitle) has 
two relevant examples: Lord of the Ring and The Great Cookbooks of the World.

I understand that this makes the design very generic (any title can have a 
type, display-sequence and group-position) but it also makes things more 
complicated:
- when people think of series, they don't think about them as a subtitle or 
alternate title for the book, series are every bit as important as dc:title or 
dc:publisher
- "collections" are used in a very different way in different countries, in 
France for example we wouldn't think of series as ordered collections (even 
though purely technically, they're indeed an ordered collection of books)
- defining the series information takes three elements (title and two 
refinements) vs a  single element for other key metadata information

Because of these three reasons, it feels like series are second-class citizens 
in EPUB metadata, even though for manga for example, this type of information 
is more important than the main title.

I believe that the same thing could be said about volumes, I doubt that people 
think about dc:identifier first when they think about volumes.

Original comment by hadrien....@feedbooks.com on 26 Apr 2013 at 1:03

GoogleCodeExporter commented 9 years ago

On the issue of series being more important than title for manga, I would 
concur and point out that the same is probably true for magazines: the name of 
the magazine is most important (which in effect is the series name) and the 
actual publication often doesn't have a title at all, just a number and a date.

If it helps further this discussion, here's what the magazine industry is 
proposing for how to create a dc:title for an issue of a magazine being 
delivered as an EPUB:

"Specifying the dc:title for a book is straightforward.  But specifying the 
title for other content, such as a magazine issue, is more complex.  When 
packaging magazine or other serial content as an EPUB 3, you will need to 
combine fields from PSV to provide a descriptive title for eReaders to display.
Best Practice:  The dc:title should consist of PSV’s prism:publicationName | 
prism:coverDisplayDate | prism:edition | prism:issueName.  The metadata fields 
should appear in this order.  Not all fields are always present.
Examples of a derived EPUB 3 dc:title for serial publications:  
• All You | June 22, 2012
• Fortune | May 21, 2012 | U.S. Edition | FORTUNE 500
• Sports Illustrated | February 17, 2012 | 2012 SWIMSUIT ISSUE | DOUBLE ISSUE
• Time International | June 4, 2012 | Time Asia

Original comment by bkasd...@apexcovantage.com on 10 Jul 2013 at 10:54

GoogleCodeExporter commented 9 years ago

I found something interesting.

It seems that this is one of the most commonly used extension to our core 
metadata vocabulary, since Calibre (a popular app for managing EPUB metadata) 
has its own elements for series:

<meta name="calibre:series" content="CAC"/>
<meta name="calibre:series_index" content="29"/>

Original comment by hadrien....@feedbooks.com on 11 Jul 2013 at 11:07

GoogleCodeExporter commented 9 years ago

The way I read the specs, this is how you detail a series and issue number:

<dc:title id="collection">Scott Pilgrim</dc:title>
    <meta refines="#collection" property="title-type">collection</meta>
    <meta refines="#collection" property="group-position">1</meta>

The problem is readers that will ignore the refining, and think it is just a 
standard title. iBooks for instance will take only the last title, whatever 
title-type it is, for sideloaded books, and discard the rest, while iTunes, and 
thus books sync'ed into iBooks with it, will do so with the first, and then for 
books downloaded from iBookStore, iBooks will take whatever iBookStore tells it 
to take, I think.

If you have a bunch of title-types to choose from, the safest thing to do so, I 
guess, is put titles that would be appropriate for reading systems that ignore 
the possibility of actually having more than one title of different kinds at 
the first and last position, and cross your fingers. i.e.:

<dc:title id="title">Scott Pilgrim #1</dc:title>
    <meta refines="#title" property="title-type">main</meta>
    <meta refines="#title" property="display-seq">1</meta>
<dc:title id="subtitle">Precious little life</dc:title>
    <meta refines="#subtitle" property="title-type">subtitle</meta>
    <meta refines="#subtitle" property="display-seq">2</meta>
<dc:title id="collection">Scott Pilgrim</dc:title>
    <meta refines="#collection" property="title-type">collection</meta>
    <meta refines="#collection" property="group-position">1</meta>
<dc:title id="fulltitle">Scott Pilgrim #1. Precious little life</dc:title>
    <meta refines="#fulltitle" property="title-type">expanded</meta> 

Note that I do not bother abusing myself in such a way, as no reader that I 
know bothers supporting what I see is the official way as per the standard.

Adding Calibre's meta tag does not hurt.

Original comment by chocolat...@gmail.com on 12 Jul 2013 at 7:41

GoogleCodeExporter commented 9 years ago

One thing to take into account is that that is a rather vague way of 
identifying a series, and different series with the same title may end up being 
grouped together.

Again, the specs are so broad one could argue it does provide a way to add a 
unique identifier to that series by means of refining a dc:identifier with a 
code from ONIX for Books, List 13, "Series identifier type code" (see 
<http://www.editeur.org/files/ONIX%20for%20books%20-%20code%20lists/ONIX_BookPro
duct_CodeLists_Issue_21.html#codelist13>), but of course, again, I doubt any 
reading system will ever care for such a thing:

<dc:identifier id="collectionId">value</dc:identifier>
    <meta refines="#collectionId" property="identifier-type" scheme="onix:codelist13">value</meta>

Original comment by chocolat...@gmail.com on 12 Jul 2013 at 7:54

GoogleCodeExporter commented 9 years ago

We finally reached consensus on how we can express this metadata.

A few notes first:
- the goal here is to express that the current publication belongs to a 
collection and express information about this collection
- this is different from what we have right now in the spec (which is only 
about title, and enables content creator to divide a title into multiple 
elements and provide information about each sub-element)
- our scope is a bit larger than initially planned, instead of focusing on just 
series, we now support any kind of collection

Here's our list of MAY/SHOULD/MUST:
- a publication MAY belong to one or more collection
- a collection MUST have a title
- a collection SHOULD provide an identifier
- a collection MAY have a collection type (series, set, volume)
- a publication MAY provide its position within that collection

Here's an example of how this works:

<meta property="belongs-to-collection" id="pub-collection">Lord of the 
Ring</meta>
<meta refines="#pub-collection" property="collection-type">set</meta>
<meta refines="#pub-collection" property="group-position">2</meta>
<meta refines="#pub-collection" property="dc:identifier">Unique identifier for 
the set</meta>

We introduce a new primary expression named "belongs-to-collection" which 
indicates that the current publication belongs to a collection and also provide 
the title for that collection.
Using "refines", we can then provide additional information such as the type of 
the collection, the position within that collection and the identifier for the 
collection.

"group-position" and "dc:identifier" are already part of the current spec, 
while "collection-type" has a controlled list of values. The proposed list for 
now is: series, set and volume.

Original comment by hadrien....@feedbooks.com on 19 Sep 2013 at 6:54

GoogleCodeExporter commented 9 years ago

Spec additions to implement this proposal are available in following document:

https://docs.google.com/document/d/1pISSPSdHaUjdUZ3yL9LPipkHjbswZBWoTE3yjLQL7JI

The final proposal remove the value "volumes" from collection-type.

Original comment by mgarrish on 21 Sep 2013 at 8:22

Changed state: ProposedSolution

GoogleCodeExporter commented 9 years ago

Specification has been updated per the proposal:

https://code.google.com/p/epub-revision/source/detail?r=4757

Original comment by mgarrish on 27 Sep 2013 at 3:10

Changed state: FinalReview

GoogleCodeExporter commented 9 years ago

group-position removed from title examples:

https://code.google.com/p/epub-revision/source/detail?r=4759

Original comment by mgarrish on 27 Sep 2013 at 3:18

GoogleCodeExporter commented 9 years ago

Original comment by bkasd...@apexcovantage.com on 17 Oct 2013 at 8:42

Changed state: Done

elmimmo commented 8 years ago

Where is this reflected in the draft of EPUB 3.1 of Jan 30th 2016?

Edit: Rereading the metadata chapter of the draft , I guess series metadata now goes out of the OPF and one is supposed to detail it into a separate ONIX XML file or some other metadata standard bundled within the EPUB archive.

The Package Document is not designed to provide a comprehensive bibliographic record, and is not the correct location for such discovery information about the EPUB Publication. Metadata records, both that conform to international standards or that are designed for custom use, can instead be associated using the link element.

ghost commented 6 years ago

Why was this removed in EPUB 3.1? ONIX XML seems like an overkill for a super common thing for books. (Series are after all one of the most purchased book types, no?)

Also, similar to NCX, ONIX is an entirely different beast of a format which isn't explained in EPUB itself, making it a lot more complicated to include what should be pretty straightforward information... (and making it less likely that reader systems actually implement checking for this info)

laudrain commented 6 years ago

Series almost never used by reading systems for the user library (as shown per survey to developers)
Instead, general need for a simplier set of metadata
Completeness and precision already achieved by other schemas as ONIX

ghost commented 6 years ago

I don't see how optional additional meta tags that add simple fields make things complicated. And how can there ever be large developer adoption if it's just in one minor version of the standard and then taken out again immediately with the next iteration?

I'm aware ONIX does a lot more, but that's also why I think it's not a good match. (and if you push people to ONIX, I don't see how metadata gets "simpler" at all.)

Did you ever do a survey of the users if they'd be interested in series information being shown? I think that would be a much more helpful idea than to survey the current state of developer adoption.

mattgarrish commented 6 years ago

The idea was originally to have linked schema.org records which could be indexed in a web-friendly version of EPUB. (EPUB 3.1 had a number of goals that weren't fully achieved, or won't be until 4.0.)

But we made a mistake with 3.0 of getting into metadata vocabularies, and were warned not to at the time. EPUB should have only provided the framework for expressing metadata, but we caved in and added some "starter" properties to address a few needs. Dropping the unused properties was an attempt to move away from that approach while retaining what was actually used.

If there are holes in the metadata, the W3C publishing group should work with schema.org, for example, to ensure that the CreativeWork classes have necessary metadata instead of always trying to build things in isolation, so then you could use a meta tag to express the series title.

jcsalomon commented 6 years ago

Series almost never used by reading systems for the user library (as shown per survey to developers)

Then I can only assume Calibre users were not included in this survey.

Or was this a survey of developers, to see who had implemented the feature? A point of interest might be how many reading systems have adopted Calibre’s non-standard series metadata, and how much of a draw that feature is.

jcsalomon commented 6 years ago

The idea was originally to have linked schema.org records which could be indexed in a web-friendly version of EPUB. (EPUB 3.1 had a number of goals that weren't fully achieved, or won't be until 4.0.)

Following https://idpf.github.io/epub-guides/schema-org-integration/, what would this look like? As best as I can read this, these lines would be somewhere in content.opf:

<meta property="rdf:type">http://schema.org/Book</meta>
…
<meta property="schema:isPartOf">Discworld</meta>
<meta property="schema:position">37</meta>

(which maps nicely to Calibre’s series and series_index, at least if decimal numbers are used in the position field).

llemeurfr commented 6 years ago

ONIX is a B2B (publisher to bookseller) metadata vocab. What is missing in EPUB is IMHO a complete B2C metadata vocabulary (user facing, ready for client-side filtering, search etc.).

ghost commented 6 years ago

Yea the problem with using external standards is that you really need examples or it'll be quite hard to guess how to integrate it. If this is actually still possible but with an external addition, it would be really helpful if this could be shown somewhere in more detail (without just going "just look at ONIX").

mattgarrish commented 6 years ago

As best as I can read this, these lines would be somewhere in content.opf:

That looks like the appropriate tagging for a series, but I work more on the accessibility side so they're not properties I've applied.

We anticipated questions of common practice by starting this guide: https://idpf.github.io/epub-guides/package-metadata/

I think it would be helpful to document this case. @laudrain ?

mattgarrish commented 6 years ago

@HadrienGardeur any thoughts you have here on series tagging would also be helpful?

HadrienGardeur commented 6 years ago

@mattgarrish sure, this is how we handle them in the Readium Web Publication Manifest: https://github.com/readium/webpub-manifest/tree/master/contexts/default#collections--series

The serialization would be different but the infoset remains the same.

HadrienGardeur commented 6 years ago

In Readium-2 some implementations are already capable of extracting Calibre metadata for series by the way, that's the case in Go at least.

laudrain commented 6 years ago

Today for EPUB, we should not reinvent any metadata language. I agree with @mattgarrish to use schema.org and contribue if there are holes.

Schema.org has properties for BookSeries http://schema.org/BookSeries in CreativeWork > CreativeWorkSeries > BookSeries: "A series of books. Included books can be indicated with the hasPart property."

There is provision in http://schema.org/Book for isPartOf CreativeWork. Not an expert, but I hope it can link to the a CreativeWork > CreativeWorkSeries > BookSeries. And http://schema.org/Book has also a position : "The position of an item in a series".

HadrienGardeur commented 6 years ago

We also use schema.org for the Readium Web Publication Manifest, but that's through a JSON-LD context.

As far as I'm aware, a number of reading systems definitely support series. For instance that's the case in Aldiko and in quite a few comics apps.

jcsalomon commented 6 years ago

The Readium example which @HadrienGardeur provides allows more than the Calibre syntax (or my understanding above of how to use schema.org): Calibre allows only one series and one series_index. But of course that version requires bibliographic metadata in yet another file, and in yet another format—lots of reader programs support Calibre’s series metadata within content.opf (Mantano’s Bookari, e.g.), but does anything at all on the market support metadata in other files and in other formats?

HadrienGardeur commented 6 years ago

@jcsalomon I'm not suggesting another file/format.

What I've linked is what we use internally in Readium-2 (an SDK for reading apps), and this is also a proposal for EPUB4/WP.

For an EPUB 3.x revision, it's probably easier to express the same info using schema.org (which we're using behind the door anyway in Readium).

mattgarrish commented 6 years ago

Calibre allows only one series and one series_index.

For an EPUB 3.x revision, it's probably easier to express the same info using schema.org

And it will be true for the EPUB 3 package document that you can only express one series, too, at least if you want to indicate the position. Without the ability to group, the positions would become ambiguous to a machine ("refining" being a terrible option here, as usual).

ghost commented 6 years ago

While this is unfortunate, I would assume a book being part of one series covers most use cases, right? Or how common is it for a book/written work in one specific release to be part of multiple series?

I can only imagine that e.g. for a book part of a story series released as part of some "best of collection" - but in that case it should work as a standalone story anyway if it's singled out like that, and just annotating it as part of the collection should be an acceptable approximation I think.

HadrienGardeur commented 6 years ago

Another alternative would be to turn positions into a first-class attribute in the next EPUB revision.

We'd get something almost on par with Readium:

<meta property="schema:Series" opf:position="2">Discworld</meta>

IMO, there is a good case in favour of this:

series are super popular with power users (Calibre)
they're also extremely useful for comics/manga (Japan)

The opf:position attribute could also work on other elements, such as dc:title to replace the sequence order that we had through refine in 3.0.x.

This would also be a clear path forward towards EPUB 4 if the WG ends up adopting the RWPM for the WP manifest.

BigBlueHat commented 6 years ago

@HadrienGardeur but this custom namespaced element's data would fall completely out of the reach of any RDFa parser--whereas the variation proposed by @jcsalomon in https://github.com/w3c/publ-epub-revision/issues/326#issuecomment-361309750 stays within that parsing/data-model space.

If you put it in a separate XML namespace, you'll have to come up with separate methods to extract/parse/manage/understand it.

HadrienGardeur commented 6 years ago

@BigBlueHat I think that's mostly irrelevant in the case of EPUB 3.x:

the model for metadata can barely qualify as RDF-ish, nothing more
who's parsing EPUB files and OPF as RDFa? I'm not aware of anyone doing that

IMO something straightforward and powerful enough is better than trying to achieve RDF purity.

Ideally we'd just have:

<series position="2">Discworld</series>

ghost commented 6 years ago

Hm. Maybe it should still have some sort of obvious reference to schema.org/Book (which a <series> tag might not obviously have) even just to make it clear where it comes from?

Otherwise, this is circling back to putting all metatags directly into the EPUB definition instead of using reasonable existing things.

I think something like <meta property="schema:Series" opf:schema:position="2">Discworld</meta> which obviously refers to the book schema might be a better idea. While that would still require demonstration in the meta guide here https://idpf.github.io/epub-guides/package-metadata/ it wouldn't be such a huge departure from the book schema.org thing that every property of that schema would require a demonstration like that.

mattgarrish commented 6 years ago

I just can't get into the idea of minting more metadata that's unique to EPUB, especially not new elements. Let's work with what schema.org provides, and live with the limitations of implementing in the package as it's defined for EPUB 3.

I wrote the integration guide while trying to implement the accessibility and educational metadata, which is why it defaults to CreativeWork and recommends rdf:type for any other instances. Realistically, it's not the optimal choice and we should treat the package as an instance of Book by default. That's where the most useful additional properties are.

I think this is something we may want to note in the specification and not farm out to an informative guide. To an extent Hadrien is right that it doesn't matter what you do unless something is parsing out a graph or translating to compliant schema.org metadata, but leading people blindly to bad practices isn't a great idea, either. Assuming an EPUB 4 that has a real metadata framework and uses schema.org metadata, it's going to come back to bite people upgrading their content.

HadrienGardeur commented 6 years ago

@mattgarrish frankly, it barely matters in the context of EPUB 3.x. If it's useful for a significant portion of the community, we might as well have it in our own namespace.

As for EPUB4, I'm advocating for a solution based on JSON-LD and schema.org myself (RWPM) but the other option out there (WAM) is not tied to any existing vocabulary.

I don't remember what the spec has to say about XML attributes, but can't we at least use schema:position as an attribute and avoid repeating meta like in the dark ages of EPUB 3.0.1 (with its infinite refine nonsense)?

<meta property="schema:Series" schema:position="2">Discworld</meta>

mattgarrish commented 6 years ago

I don't remember what the spec has to say about XML attributes

They aren't allowed unless explicitly defined. There's no real extensibility of the package metadata beyond being able to use the package/@epub:vocab and meta/@property to reference vocabularies/properties.

Additions along the attribute axis are less invasive than new elements, but how would such a thing work? Do we allow any schema:* attribute and leave it to implementers to figure out how to make it work, or do we only add schema:position, in which case wouldn't it be a little odd that the attribute is available no matter what property you're expressing?

My worry with the first option is that it could make it impossible to translate the metadata. Refines is messy, but there is some (untested) logic to how it can be translated.

Looking at schema.org, doesn't it also enforce the one-series flaw of the package document? Since isPartOf and position are both child properties of CreativeWork/Book and not a set, a publication can't have an unambiguous position in more than one series.

The BookSeries class doesn't solve this problem, either, although it lets you say a lot more about the series. (Using the title of the series as the value of isPartOf is a technical violation, but schema.org isn't strict in enforcing the expected type of any property.)

Unless I'm missing something, my inclination would still be to live within what is possible to do now.

HadrienGardeur commented 6 years ago

@mattgarrish schema:Series is a CreativeWork as well, which means that it can have a position.

Looking back at our EPUB 3.1 WG, we did end up introducing a number of new XML attributes, in order to express a number of things that required refine before:

opf:alt-rep
opf:alt-rep-lang
opf:role
opf:scheme
opf:authority
opf:term

What we never ported to attributes is the ability to express a position in a list or sequence. EPUB 3.0.1 had this for titles for instance, here's an example from the spec:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:title id="t1" xml:lang="fr">Mon premier guide de cuisson, un Mémoire</dc:title>
    <meta refines="#t1" property="title-type">main</meta>
    <meta refines="#t1" property="display-seq">2</meta>

    <dc:title id="t2">The Great Cookbooks of the World</dc:title>
    <meta refines="#t2" property="title-type">collection</meta>
    <meta refines="#t2" property="display-seq">1</meta>

    <dc:title id="t3">The New French Cuisine Masters</dc:title>
    <meta refines="#t3" property="title-type">collection</meta>
    <meta refines="#t3" property="display-seq">3</meta>

    <dc:title id="t4">Special Anniversary Edition</dc:title>
    <meta refines="#t4" property="title-type">edition</meta>
    <meta refines="#t4" property="display-seq">4</meta>

    <dc:title id="t5">The Great Cookbooks of the World: 
        Mon premier guide de cuisson, un Mémoire. 
        The New French Cuisine Masters, Volume Two. 
        Special Anniversary Edition</dc:title>
    <meta refines="#t5" property="title-type">expanded</meta>
    …
</metadata>

I believe that we could create a new attribute that would work for title as well as http://schema.org/Series and http://bib.schema.org/Collection. This new opf:position would express the position of an element in a sequence/list.

Here are a two examples:

<title opf:position="1">Flatland</title>
<title opf:position="2">A Romance of Many Dimensions</title>
<meta property="bib:Collection" opf:position="26">SF Classics</meta>

<title>Guards! Guards!</title>
<meta property="schema:Series" opf:position="8">Discworld</meta>
<meta property="schema:Series" opf:position="1">City Watch</meta>

(BTW, this is a valid example of a book where two series are useful, since you could either read all of the Discworld series or just the one focusing on the City Watch. There are quite a few similar examples in fantasy or SF series.)

llemeurfr commented 6 years ago

@HadrienGardeur's proposal is appealing, much easier to understand than the use of the refine attribute. But if we don't deprecate refine in 3.2 because of its use by the Japanese publishing industry, can we still make so that we don't end up with two alternative way to do the same thing? In other words does the Japanese industry use refine with property = display-seq? If to at least this value could be deprecated...

mattgarrish commented 6 years ago

schema:Series is a CreativeWork as well, which means that it can have a position

That position would be the position of the series in something else to which it belongs, not the position of something which belongs to the series. Series is its own class that just describes the series. The publication is part of a series, but its position is unique to itself and expressed within its own class, which can only be done once.

I'm also leery of mixing ordering of elements together with position within a set. They're incongruous concepts. How does a reading system know when you're referring to display sequence or when the number is just a bit of information that is supposed to be displayed? That's why we separated display-seq from group-position.

But this may all be moot, since 3.2 is returning refines and with it has to come belongs-to-collection, collection-type and group-position. We may as well keep using those.

HadrienGardeur commented 6 years ago

Frankly, the semantics for schema.org can be a little fuzzy as well. A position on a CreativeWork itself means nothing, it has to be a position in the context of an ordered list.

For display-seq vs group-position, I think that we're overthinking. Once again, we're barely RDF-ish, I'd much rather have less attributes and straightforward metadata expression than semantic purity (which we won't achieve anyway).

I would also get rid of either opf:scheme or opf:authority, a single attribute can IMO work for both use cases.

The return of refines and all associated properties is just sad, they're terrible to work with and it feels like going backward.

mattgarrish commented 6 years ago

A position on a CreativeWork itself means nothing, it has to be a position in the context of an ordered list.

That's the flaw I mentioned above by having them both as direct properties of each class in schema.org. isPartOf says which series it belongs to and the completely detached position gives its position in the series. They should be grouped to avoid the ambiguity, but in such a way that they are their own unique datatype for some other property like belongs-to-collection. That's the problem of a natural growth vocabulary. Too much is sometimes thrown at the wall too quickly.

I hate properties that change meaning depending on context, though, which is why I don't agree it's overthinking to have two. With display-seq, the set is fully defined in the metadata (it's an ordering property for like things); group-position indicates belonging to a set that is defined somewhere else. If you squash them together, I can't see how you can be sure of anything unless you code the logic for every instance.

I do agree that the return of refines is sad; I was glad to ring its death knell in 3.1. But it does allow multiple unambiguous series titles and positions, again.

No opinion here on opf:scheme and opf:authority. Didn't we introduce opf:authority because we were trying to keep opf:scheme compatible with its use in EPUB 2? At any rate, I believe these are all toast in 3.2 since we're reverting to refines.

mattgarrish commented 6 years ago

But fwiw, display-seq should be deprecated in 3.2. As far as I remember, we defined document order as the indicator of display for various elements since reading systems use that anyway.

jcsalomon commented 6 years ago

@llemeurfr wrote

But if we don't deprecate refines in 3.2 because of its use by the Japanese publishing industry,

How are they using it (besides for series metadata in e-comics)? Whatever alternative is proposed must accommodate their use-case.

And while refines is un-XML-ish and complex usages can get unwieldy, I’m a small-time book producer (made four books for an author friend) and I’m reasonably tech-savvy: I figured out the refines format for series metadata in a few minutes of reading the documentation.

I also spent a few days reading the 3.1 standard and chasing down one outside reference after another and could not figure out what the 3.1 Way was. Well, I partly could: include in content.opf a link to a metadata file to be stored elsewhere in the directory tree, this metadata being in any of a dozen formats used by various cataloging schemes but which are neither easily readable by mere humans not writeable without flipping back-and-forth between multiple cross-linked standards documents, and knowing that even if any e-reader implemented the 3.1 standard chances were against it also understanding whichever additional metadata format I might chose.

(Feature request for an EPUB 2 e-reader: “Here in two lines is Calibre’s extension; can you please interpret these and let the user categorize books by series?”

(Ditto for an EPUB 3 e-reader [adapted from an actual feature request I submitted]: “Here in three lines is EPUB 3’s series standard; can you please interpret these and let the user categorize books by series?”

(Ditto for an EPUB 3.1 e-reader: Don’t make me laugh.)

You want the cataloging metadata out of content.opf? fine, give me another place to put it. You don’t like refines? fine, give me another format to work with. But—

define one location and one format which all e-readers of EPUB version ‹whatever› are expected to understand and present to their users; and
make that format clean, human-readable & -writeable, and extendible, and from the start including the things which readers and publishers have repeatedly said they want.

ghost commented 6 years ago

After giving this some more thought, I think a big issue with EPUB 3+ is <meta property="schema:numberOfPages">227</meta> is completely unlike the examples schema.org gives which are <span property="numberOfPages">224</span> (RFDa) and <span itemprop="numberOfPages">224</span> (microdata). That EPUB 3 appears to use schema.org, and most of it says "just check schema.org for what you can use" but then didn't reuse one of the formats that schema.org actually explains is a small disaster, in my humble opinion.

By the way, the schema.org JSON-LD variant looks infinitely more readable than all the weird EPUB-meta-tags-and-refines-mixed-with-schema.org-but-different:

{
  "@context":  "http://schema.org/",
  "@id": "#record",
  "@type": "Book",
  "additionalType": "Product",
  "name": "Le concerto",
  "author": "Ferchault, Guy",
  "offers":{
      "@type": "Offer",
      "availability": "http://schema.org/InStock",
      "serialNumber": "CONC91000937",
      "sku": "780 R2",
      "numberOfPages": 134,
      "offeredBy": {
          "@type": "Library",
          "@id": "http://library.anytown.gov.uk",
          "name": "Anytown City Library"
      },
      "businessFunction": "http://purl.org/goodrelations/v1#LeaseOut",
      "itemOffered": "#record"
    }
}

Part of the problem with the meta/refines/nonsense is that you can't nest properly, and that due to always writing out "meta" and "refines" as full words, it becomes super lengthy and complicated really quick.

Maybe it would be best to consider adopting the exact schema.org JSON-LD standard as an additional file or inline JSON inside a tag? Or at least pick one of the exact other formats as specified on schema.org. I don't think referring to schema.org for the contents but completely redefining the syntax inside EPUB with complicated meta/refines nesting stuff does anyone any good.

I mean, look how lengthy this is just to explain how to nest things: https://idpf.github.io/epub-guides/schema-org-integration/#h.8w8btnbwlf6r Wouldn't it be a lot easier to just write use <schema-org-meta> .... JSON-ld contents here ... </schema-org-meta> and everything about the syntax is on schema.org, here is a single non-trivial example in a complete OPF: ... for everyone involved?

ghost commented 6 years ago

Any thoughts on this? I really think if you want to use the schema.org standard, you should also use e.g. the JSON-LD syntax variant exactly as specified there (or one of the others) - or alternatively, specify metadata directly in EPUB without referring to schema.org. The EPUB format already has quite the complexity, you're not really doing it any favors by adding another complicated standard, and then completely deviating from its own example pages of how to use it...

jeffmcneill commented 6 years ago

As a publisher, it is unclear to me what the consensus is on how to implement a series and number, at this point in the discussion. I understand the calibre metadata tags and will use those, but could someone summarize/point out what I should do to be standards-compliant as of Nov 2018? That would likely be helpful to others. If there is zero consensus, then the most popular way to indicate series and number would be helpful.

jcsalomon commented 6 years ago

@jeffmcneill, since the “new” EPUB3 is mostly a reversion to EPUB 3.0.1, that’s the model to follow. See https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#sec-belongs-to-collection for the spec, though the examples at https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#group-position are perhaps a bit more complete. Basically:

<meta id="num" property="belongs-to-collection">Series Name Goes Here</meta>
<meta property="collection-type" refines="#num">series</meta>
<meta property="group-position" refines="#num">1</meta>

Caveats:

I’m not clear when you’d use set instead of series. The example shown puts a Harry Potter book in a set, but would that have been the right decision when the books were still being released? How about a book series whose final length is not known at the moment?
Note that group-position is “A single xsd:unsignedInt or series of decimal-separated numbers (e.g., 1 or 2.2.1).”)

The specs recommend also giving the collection an identifier, e.g.,

<meta property="belongs-to-collection" id="c02">Harry Potter</meta>
<meta refines="#c02" property="collection-type">set</meta>
<meta refines="#c02" property="group-position">2</meta>
<meta refines="#c02" property="dcterms:identifier">urn:uuid:99999999-8888-7777-6666-555555555555</meta>

but I have no idea whether any reading system implemented actually cares about that. Probably can’t hurt, though.

Supposedly, a book can belong to multiple collections. Good luck getting any reading systems to respect that, though.

w3c / epub-specs

metadata for series name and volume number #326