w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

non-linear resources - primary, secondary or something else? #16

Closed mattgarrish closed 7 years ago

mattgarrish commented 7 years ago

An issue I'm sure no one is looking forward to, but where do "non-linear" resources fall into the publication hierarchy?

For reference, non-linear is what epub calls documents that contain supplementary information not part of the primary narrative. In EPUB, they were marked as non-linear and required to be in the spine, but whether they were suppressed when reading the book or not was reading system dependent.

Is WP going to define something similar, or can the reading order be defined without including supplementary documents in the reading order?

If they can be excluded, these will be neither primary nor secondary resources, but something else.

If they can be excluded, does the author have to provide previous/next page navigation.

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 9:05 AM, Matt Garrish notifications@github.com wrote:

An issue I'm sure no one is looking forward to, but where do "non-linear" resources fall into the publication hierarchy?

For reference, non-linear is what epub calls documents that contain supplementary information not part of the primary narrative.

Can you give an example? I've been trying to think of something primary(!) that wouldn't go in the DRO, but can't. Most of ones that I can, would be secondary resources.

If they can be excluded, does the author have to provide previous/next page navigation.

I think we decided that we were tabling the author-provided vs. UA provided UX questions given their controversial nature...

mattgarrish commented 7 years ago

Can you give an example?

The general EPUB example is an answer key. You might provide a link to one where a quiz is, but you don't necessarily want it in the reading order. It's technically a primary resource, but would violate a statement that all primary resources are in the reading order.

Thinking of the web, any kind of document you would pop out rather than have as part of a sequential reading order has an equally ambiguous status. They're not secondary, but not primary, either. Do these have to be listed as primary resources in the reading order?

RachelComerford commented 7 years ago

Agreed - we deal with this in education with glossary definitions that are linked in the basal text in education (as well as footnotes, references, etc)

Rachel Comerford | Director of Content Standards | T 212.576.9433

Macmillan Learning

On Fri, Aug 4, 2017 at 9:46 AM, Matt Garrish notifications@github.com wrote:

Can you give an example?

The general EPUB example is an answer key. You might provide a link to one where a quiz is, but you don't necessarily want it in the reading order. It's technically a primary resource, but would violate a statement that all primary resources are in the reading order.

Thinking of the web, any kind of document you would pop out rather than have as part of a sequential reading order has an equally ambiguous status. They're not secondary, but not primary, either. Do these have to be listed as primary resources in the reading order?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/16#issuecomment-320253167, or mute the thread https://github.com/notifications/unsubscribe-auth/AU_0tdY7gfyurqu7CFWjREWSdkmyz-4Bks5sUyDKgaJpZM4OtsOG .

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 9:46 AM, Matt Garrish notifications@github.com wrote:

Can you give an example?

The general EPUB example is an answer key. You might provide a link to one where a quiz is, but you don't necessarily want it in the reading order. It's technically a primary resource, but would violate a statement that all primary resources are in the reading order.

If you have a link to it from somewhere else, and it's not in the DRO, then it's a secondary resource (at least as I view it).

And why wouldn't it be in the DRO? If I was physically paging through the book, I would get there - so why not put it in the DRO for an electronic pub?

Thinking of the web, any kind of document you would pop out rather than

have as part of a sequential reading order has an equally ambiguous status. They're not secondary, but not primary, either. Do these have to be listed as primary resources in the reading order?

For me, that's exactly the definition of secondary! Something that is only accessed from a primary.

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 9:50 AM, RachelComerford notifications@github.com wrote:

Agreed - we deal with this in education with glossary definitions that are linked in the basal text in education (as well as footnotes, references, etc)

Footnotes and references, by themselves, are (IMO) secondary. However a "Notes" section would be primary and in the DRO.

pkra commented 7 years ago

@lrosenthol

And why wouldn't it be in the DRO? If I was physically paging through the book, I would get there - so why not put it in the DRO for an electronic pub?

Because authors finally could, cf. https://en.wikipedia.org/wiki/Easter_egg_(media).

WSchindler commented 7 years ago

If any resource referenced from a primary resource would have the status of a secondary resource, we would need a mechanism to distinguish between secondary resources which are essential/indispensable for the proper rendering and/or understanding of the main narrative and those that the user may consume or not. If I talk in my text about the following diagram or picture which won't be rendered, it would be a quite disturbing reading experience. But given on his or her personal level of expertise a reference to a glossary entry may be safely ignored or even remain hidden (based on a user preference). IMO, we don't necessarily need a further term, but we need a stable mechanism to be able to define the relevance of resource for the primary narrative.

RachelComerford commented 7 years ago

A request from the less experienced or technical among us - DRO (default reading experience) is not a googlable epub term unless you're willing to dig. Can we avoid acronyms where possible?

Rachel Comerford | Director of Content Standards | T 212.576.9433

Macmillan Learning

On Fri, Aug 4, 2017 at 9:59 AM, Peter Krautzberger <notifications@github.com

wrote:

@lrosenthol https://github.com/lrosenthol

And why wouldn't it be in the DRO? If I was physically paging through the book, I would get there - so why not put it in the DRO for an electronic pub?

Because authors finally could, cf. https://en.wikipedia.org/wiki/ Easteregg(media).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/16#issuecomment-320256427, or mute the thread https://github.com/notifications/unsubscribe-auth/AU_0tcuJSB0d7Z8HKSYfX9lJGWc6lseCks5sUyPdgaJpZM4OtsOG .

clapierre commented 7 years ago

Can you give an example? I've been trying to think of something primary(!) that wouldn't go in the DRO, but can't. Most of ones that I can, would be secondary resources.

Wouldn’t a choose your own adventure story classify as a number of primary resources that are not in a DRO? Thanks Charles LaPierre

TzviyaSiegman commented 7 years ago

I am going to be radical here. Since @duga explained the way that linear in EPUB came to be, I am less enamored of it. As @lrosenthol said, an answer key would be at the end of a book if I were paging through it. In discussing the concept with EPUB Reading System developers, I learned that excluding the content creates many problems. For example, should content that is non-linear be part of search? be paginated? I think that we might need to make these author decisions. That does not mean that it is not possible to hide some content from view (as with an answer key), but I don't think we should do this with a magical attribute. It is possible to create quizzes with answer keys on the Web without a magical attribute. We should create WP in the same way. (PWP and EPUB 4 might be different). (I think DRO is default reading order, but I agree that we should not invent acronyms.)

mattgarrish commented 7 years ago

Wouldn’t a choose your own adventure story classify as a number of primary resources that are not in a DRO?

That's another common example.

Also, if you have a link that says to click an image to see the larger view, and then the image opens in a new window, that image is no longer a secondary resource. Is it listed as a primary resource, in the reading order?

If you have a link to it from somewhere else, and it's not in the DRO, then it's a secondary resource

That's not the way they're currently defined, though. A linked or popped-out resource isn't required for the processing or rendering of a primary resource.

If we change to allow resources to be directly rendered without having to be listed as primary resources in the reading order, then we're moving in the direction of non-linear content not having to be in the spine (to equate to EPUB).

I'm not saying we want to invent "non-linear" again, but we need to be clear what is required in the reading order and what it means (and what are any processing expectations) when resources are not.

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 10:04 AM, Wolfgang Schindler < notifications@github.com> wrote:

If any resource referenced from a primary resource would have the status of a secondary resource, we would need a mechanism to distinguish between secondary resources which are essential/indispensable for the proper rendering and/or understanding of the main narrative and those that the user may consume or not.

Agreed. This is something that is called out in the use cases from the original IG documents.

My belief is that we don't need these for WP but only for PWP, and thus can wait to address this topic.

dauwhe commented 7 years ago

I think it's entirely reasonable to have an HTML resource in a web publication that is not part of the default reading order. It's just like having a link to somewhere else on the web. A document author can control to some extent what happens with such a link using the target attribute.

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 10:36 AM, Matt Garrish notifications@github.com wrote:

Wouldn’t a choose your own adventure story classify as a number of primary resources that are not in a DRO?

That's another common example.

And therefore it's not the default reading order, but an alternative reading order. So they wouldn't be primary resources.

But I agree, it's an interesting example...

Also, if you have a link that says to click an image to see the larger view, and then the image opens in a new window, that image is no longer a secondary resource.

Where the content is displayed has no bearing on whether it is primary or secondary. That new window is still secondary.

If you have a link to it from somewhere else, and it's not in the DRO, then it's a secondary resource

That's not the way they're currently defined, though. A linked or popped-out resource isn't required for the processing or rendering of a primary resource.

True. But it's also not a primary resource. And I would hope to avoid a "tertiary resource" :).

I'm not saying we want to invent "non-linear" again, but we need to be clear what is required in the reading order and what it means (and what are any processing expectations) when resources are not.

For WP, I don't believe there are any special processing expectations on secondary resources. I am also not sure if there are any on primary - at least in an of themselves. We might put some on the default reading order concept, however.

clapierre commented 7 years ago

On Aug 4, 2017, at 7:48 AM, Leonard Rosenthol notifications@github.com<mailto:notifications@github.com> wrote:

I'm not saying we want to invent "non-linear" again, but we need to be clear what is required in the reading order and what it means (and what are any processing expectations) when resources are not.

For WP, I don't believe there are any special processing expectations on secondary resources. I am also not sure if there are any on primary - at least in an of themselves. We might put some on the default reading order concept, however.

In light of what Tzviya, Matt and Leonard has said do we really need to distinguish between a primary and secondary resource? Why can’t we list all resources that are inside along with a Default Reading Order of some or all of these resources. What benefit does marking some as primary or as secondary give us?

Thanks Charles LaPierre

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 11:05 AM, Charles LaPierre notifications@github.com wrote:

In light of what Tzviya, Matt and Leonard has said do we really need to distinguish between a primary and secondary resource? Why can’t we list all resources that are inside along with a Default Reading Order of some or all of these resources. What benefit does marking some as primary or as secondary give us?

Because listing secondary ones is extremely complicated for publications that can/will change...

clapierre commented 7 years ago

On Aug 4, 2017, at 8:09 AM, Leonard Rosenthol notifications@github.com<mailto:notifications@github.com> wrote:

Because listing secondary ones is extremely complicated for publications that can/will change...

Ok, agreed, but I still don’t see the point, at point x in time there were 23 resources in this WP, and at point y in time there are now 26 resources and maybe the default reading order was updated to include some of these new resources or maybe not. Why should we care? Just like the web, pages, resources etc can change at any time by the web developer.

Charles

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 11:21 AM, Charles LaPierre notifications@github.com wrote:

On Aug 4, 2017, at 8:09 AM, Leonard Rosenthol <notifications@github.com< mailto:notifications@github.com>> wrote:

Because listing secondary ones is extremely complicated for publications that can/will change...

Ok, agreed, but I still don’t see the point, at point x in time there were 23 resources in this WP, and at point y in time there are now 26 resources and maybe the default reading order was updated to include some of these new resources or maybe not. Why should we care? Just like the web, pages, resources etc can change at any time by the web developer.

I agree that for WP, this is a non-issue. However, for PWP it's a big problem!

First and foremost, you are thinking about the publisher/author as the only person that can modify the publication. That's true for web-hosted content (and thus WPs), but it is not the case for "off-theweb" content (ala PWP). Consider that one of the top three things that people do with PDFs is to combine them together.

Second is the possibility for name/path/URI conflicts when merging content together and the need to reconcile that in a single list.

clapierre commented 7 years ago

On Aug 4, 2017, at 8:27 AM, Leonard Rosenthol notifications@github.com wrote:

I agree that for WP, this is a non-issue. However, for PWP it's a big problem!

First and foremost, you are thinking about the publisher/author as the only person that can modify the publication. That's true for web-hosted content (and thus WPs), but it is not the case for "off-theweb" content (ala PWP). Consider that one of the top three things that people do with PDFs is to combine them together.

Do we really need to consider this use case of allowing the option for combining of PWP’s. I agree that there could be issues if people modify content but knowing what was a primary or secondary resource doesn’t really matter in my mind. If I want to add in image descriptions to the secondary content or the primary content I don’t see a problem, but maybe I am missing something.

Charles.

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 11:38 AM, Charles LaPierre notifications@github.com wrote:

Do we really need to consider this use case of allowing the option for combining of PWP’s.

If we believe that PWPs will serve use cases beyond traditional books and publications - absolutely!!!

That said, we may decide that for EPUB4, we want to continue to focus on the current EPUB market segment and therefore for that spec this wouldn't be an issue.

But that's why we have profiles :)

I agree that there could be issues if people modify content but knowing what was a primary or secondary resource doesn’t really matter in my mind. If I want to add in image descriptions to the secondary content or the primary content I don’t see a problem, but maybe I am missing something.

Consider building a publication containing multiple articles about the Louvre (using our favorite example), where two different publications each have their own versions of the MonaLisa.jpg image. In the primary resources that refer to it, the URL is "img/MonaLisa.jpg". How would you have two instances of that same URL in the manifest listing of secondary? And more importantly, when merging them, how do you keep them separate? (NOTE: there are a variety of possible options, but they are much simpler if you don't have to keep a manifest of them)

BigBlueHat commented 7 years ago

Feels like we're heading into unrelated territory at the moment (re: combining PWPs). When we address that use case, we should address it. 😄 There is still much to be discussed around identification of resources (any of them...) in their various states: on the Web, offline, and portable'd. We'll get there. But that's not here nor now (please).

To Matt's original question, the issue seems to center around what is content and what is dependency (as well as when/where are those things outlined, described, referenced).

Does that describe the issue more clearly?

mattgarrish commented 7 years ago

I think it's entirely reasonable to have an HTML resource in a web publication that is not part of the default reading order.

I tend to agree. I don't believe it's necessary for there to be a path through every piece of content.

The current definition of secondary resource doesn't allow for this, though. There are primary resources in the default reading order, and secondary resources needed for their processing or rendering.

Perhaps the publishing and linking definitions could help here, namely Web Content and Web Page.[1]

A Web Publication consists of one or more Web Pages, at least one of which has to be listed in the default reading order. All Web Content has to be listed in the manifest, with Web Pages in the default reading order potentially not having to be repeated.

[1] https://www.w3.org/TR/publishing-linking/#terminology

HadrienGardeur commented 7 years ago

In Readium-2 we decided to do the following thing when ingesting EPUB files:

I really don't think that we need the concept of linearity for WP/PWP, this is one of the most confusing thing about the EPUB spine today and its implementation across UAs is extremely inconsistent.

HadrienGardeur commented 7 years ago

Also for reference, this is how we handle primary/secondary resources in Readium-2:

This means that for an adventure story or a quiz, the performance won't be as good (the resource won't be preloaded in advance).

Aside purely from processing requirements, the impact is mostly on UX. Unlike primary resources, when secondary resources are rendered, there's no option to browse to the next/previous one (since they're not in the reading order). You need to either:

lrosenthol commented 7 years ago

On Fri, Aug 4, 2017 at 12:18 PM, Matt Garrish notifications@github.com wrote:

Perhaps the publishing and linking definitions could help here, namely Web Content and Web Page.[1]

No, that just makes it worse. Web Pages are things on the web, which resources in a WP (or PWP) may not be.

BigBlueHat commented 7 years ago

I'm beginning to think that we need a clarification for "secondary resources." Most often that phrase has been used to refer to a stylistic/functionality/display dependency (CSS, JS, fonts), but now in this conversation it's been used to refer to "secondary content" (quiz answers, etc).

As long as those are mixed in a single term, we'll remain confused, I fear--hence my comment about content and dependencies.

BigBlueHat commented 7 years ago

@lrosenthol I think for now (and expressly this issue), we need to focus on this line from the Use Case document: "All these constraints are formalized in the context of the usage on the Web and by extension Web Publications."

Yes, there are scenarios where a WP may be "off the Web" (prior to publication served off localhost, etc) and certainly where it is packaged (ala PWP), but I think we should burn those bridges when we get to them. 😸

lrosenthol commented 7 years ago

Not disgreeing with you, @bigbluehat - just saying that I don't want to confuse our definitions with that.

As to secondary resources, I think the definition we have already is fine. That they are (to quote you), refer to a stylistic/functionality/display dependency (CSS, JS, fonts),

BigBlueHat commented 7 years ago

That's my reading also @lrosenthol.

Copying/pasting from the terminology section:

Primary Resource

A primary resource is one that is listed in the default reading order of a Web Publication.

Secondary Resource

A secondary resource is one that is required for the processing or rendering of a primary resource.

HadrienGardeur commented 7 years ago

I'm not entirely convinced that we need three separate concepts.

I think the confusion (if any) is tied to the terminology (primary/secondary) and our current definitions (mostly for secondary) rather than something else.

As long as we clearly identify the media type in the manifest, I really don't think that listing such secondary content documents and stylistic/functional/display dependency together is an issue.

TzviyaSiegman commented 7 years ago

https://w3c.github.io/wpub/#terminology includes the WORKING definition of secondary resources:

Secondary Resource A secondary resource is one that is required for the processing or rendering of a primary resource.

Going back to the theme of this thread, where (if anywhere) does what EPUB called non-linear content fit into WP? Several people are saying it's not relevant to WP. I am not sure we need to decide before FPWD. This may be something we can get to as we put more flesh on the bones.

dauwhe commented 7 years ago

I think that we have only two categories: resources in the primary default reading order, and everything else. The "everything else" might include HTML. I'm just trying to think of the smallest amount of extra info we need to make publications work.

HadrienGardeur commented 7 years ago

For example we could simply say that:

Secondary Resource A secondary resource is one that is required for the processing or rendering of the publication.

IMO, the resources identified as secondary shouldn't have to be tied to a primary resource. For example, a publication might contain a high resolution cover that is not used by a primary resource.

mattgarrish commented 7 years ago

A secondary resource is one that is required for the processing or rendering of the publication.

IMO, the resources identified as secondary shouldn't have to be tied to a primary resource.

I'm fine with this approach, too, and it solves my more immediate concern about our definitions. As long as the definition says "primary resource", a secondary resource has to be tied to a primary.

We can argue details like what this means for progression through a publication, and whether non-linearity needs introduction when we get deeper into defining the reading order.

danielweck commented 7 years ago

I believe we now have a pretty good definition for primary and secondary resources. However, I believe that the notion of "non-linear" as inherited from EPUB (that is to say: spine item content documents explicitly marked with the @linear="no" attribute) is redundant / unnecessary in the context of Web Publications, in the sense that any primary resource not taking part in the default reading order is implicitly an "ancillary" resource (such as accessible extended descriptions). The reason why such documents in EPUB have to be marked explicitly with the @linear="no" attribute is because there is a requirement to include in the spine all the content documents that are linked to / navigable, even if they are not part of the default linear reading order. There is no such constraint in Web Publication, so the need for explicit non-linear is negated. Furthermore, the underspecified non-linear feature is objectively largely misunderstood by both content creators and reading system implementors, leading to glaring inconsistencies across the board (content model, and user experience). Note that in Readium2 when converting from EPUB3 to the internal "webpub manifest" data model, @linear="no" spine items are simply added to the list of resource "links", whereas explicit or implicit @linear="yes" spine items are preserved in the list of spine "links". These two lists of "links" are mutually exclusive (no intersection / overlap), so there is no room for ambiguity.

llemeurfr commented 7 years ago

For the record, here is the spec of "linear" attribute in EPUB 3.1 and the older 3.0 spec.

An interesting term is "auxiliary", where we use "secondary" in our discussion: we may want to come back to this standard EPUB term, as "auxiliary" (for the publication) may sound better for non-linear content like a cover page (I found the use case in EPUB 3 Best Practices from Matt & Markus) or a quizz than "secondary" (which is more closely tied to a hierarchy where "primary" is using "secondary").

llemeurfr commented 7 years ago

In https://github.com/w3c/wpub/pull/46#issuecomment-324952292, @mattgarrish said:

I can still live with where we ended up in #16.

I understand that it means - in the WG accepts this consensus - that the definition of a secondary (or auxiliary?) resource will be changed to:

Secondary Resource A secondary resource is one that is required for the processing or rendering of the publication.

and that the definition of a primary resource will be back to:

Primary Resource A primary resource is one that is listed in the default reading order of a Web Publication.

Therefore the equivalent of EPUB 3 non-linear content will be categorized as secondary/auxiliary resource and this issue will be closed. right?

mattgarrish commented 7 years ago

Therefore the equivalent of EPUB 3 non-linear content will be categorized as secondary/auxiliary resource

Yes, and also secondary doesn't mean supplementary content as it did in EPUB. Secondary has no relationship to whether the content is part of any primary narrative not expressed by the default reading order (e.g., choose your own adventures or similar where the user follows links in the content instead of relying on document sequence).

mattgarrish commented 7 years ago

Closing this issue as my original question about linear/non-linear has been answered: primary resources are in the default reading order, but not every resource the user might navigate to has to be in the default reading order.

There is no concept of linear/non-linear, in other words, as being secondary carries no specific connotation about the type of resource, only that it is not reachable if you only follow the default reading order via UA-provided navigation.

I'm not sure why we need to call them anything more than resources in this scenario (or at least why ones in the default reading order have any "primacy"), but that's a specific question I'll open elsewhere.