jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.61k stars 3.38k forks source link

Add global --include-before option for epub format #7478

Open Samreay opened 3 years ago

Samreay commented 3 years ago

For many epub works, it is common to have sections before the Table of Contents. A foreword, acknowledgements, copyright and dedication are all fairly normal to include.

For other formats, including the -B (--include-before-body) option is normally sufficient to include content before the generated Table of Contents.

However, for the epub format, each chapter is treated as its own file (even with the -s standalone option.

This means that if you include a file with a foreword (for example) in via the -B argument, your epub output will have the foreword at the beginning of every chapter.

As far as I can tell (at least, having spent a day or two trying to find a solution), there is no way to add content before the table of contents by itself in the epub export. This would be a great feature to add to pandoc for book publishing.

jgm commented 3 years ago

I assume that you're talking about the table of contents encoded in nav.xhtml in the epub container? I'm not sure, actually, how to adjust its position relative to other elements. As an experiment, I tried reordering it in the manifest in content.opf, but on ibooks this made no difference, so it may be that some readers will always put the nav first?

Samreay commented 3 years ago

I'm unsure if ibooks is pulling things to the front, but I dont see this behaviour on Calibre or the Kindle reader. I can provide some screenshots showing typical epub layout if that helps, or upload some example .opf manifests and spine snippets, and contrast that to the repeated behaviour of the -B flag which inserts front matter at the beginning of every chapter. Let me know what I can provide :)

jgm commented 3 years ago

I understand why -B won't do what you're looking for here. But I would be interested in knowing what affects the relative position of the TOC in epubs, and whether this is reader-specific. Perhaps you could upload a sample epub that has the front matter before the TOC, or at least the snippets you describe.

Samreay commented 3 years ago

Sure thing. For more motivation, I am trying to set up a compilation pipeline for publishing a novel and have to conform to KDP standards which state here in Step 4 that front matter order is:

Kings of the Wyld layout in the Kindle Viewer:

KingsOfWyld

KingsOfWyld_package.opf.txt

Misborn (3 books in one epub):

misborn

Misborn_content.opf.txt

Memory of Light (calibre book viewer showing dedications similar frontmatter):

MemoryOfLight

MemoryOfLight_content.opf.txt

Sword of Kaigen online epub viewer:

SwordOfKaigen

SwordOfKaigen_content.opf.txt

Let me know if any other parts of any of those epubs might be useful, unsure about the legality of sharing them on an open source platform

Samreay commented 3 years ago

Some of the opfs didnt seem to upload properly. Here are all of them zipped together.

contents.zip

jgm commented 3 years ago

For easy reference,

<spine page-progression-direction="ltr" toc="ncx">
<itemref idref="cover" linear="no"/>
<itemref idref="titlepage" linear="yes"/>
<itemref idref="copyright" linear="yes"/>
<itemref idref="toc" linear="yes"/>
<itemref idref="dedication" linear="yes"/>
<itemref idref="preface001" linear="yes"/>
<itemref idref="chapter001" linear="yes"/>
<itemref idref="chapter002" linear="yes"/>
<itemref idref="chapter003" linear="yes"/>

The page-progression-direction and linear attributes aren't used by pandoc; that might be the difference.

If we used them, we could introduce a way to mark chapters as pre-toc (and presumably also as not appearing in the toc)? In some writers we are sensitive to frontmatter and unlisted classes.

jgm commented 3 years ago

Related: #1609 #1593

Samreay commented 3 years ago

Being able to label chapters as pre-toc would be useful. The KDP guide doesnt seem to indicate that the pre-toc entires are required in the toc, so leaving them out seems appropriate. They explicitly state the TOCs are required for ebooks to allow users to navigate between book chapters, so all the non-chapter pre-toc content shouldnt need to appear.

As to the best way of labelling chapters, I simply have a file frontmatter.md that has all the pre-toc content, however if theres a a way of labelling files or chapters I've missed in pandoc, anything other format would work, I am very much a beginner with the software.

jgm commented 3 years ago

Sorry, I was being dumb. In my test, I changed the order in the manifest element, not the spine. The spine is what's needed. linear and page-progression-direction are not relevant here. Changing the order in the spine works.

In the academic publishing I'm most experienced with, the preface and acknowledgements come after the TOC and are listed in it (though numbered differently, with roman-numbered pages). All that comes before the TOC are the title page, half title page (copyright), dedication (if any), and series page. Maybe it's different with (some?) fiction.

jgm commented 3 years ago

In the PDF you linked to, it says:

So, this suggests that the forward comes after the TOC. All that comes before is the title page, copyright info, and dedication.

You can do this in pandoc using a custom epub template. If you look at the default epub3 template, you'll see a big conditional starting $if(titlepage)$. Anything you put in this section will go before the toc. So, you could easily add your copyright info here.

In fact, the template already includes a rights variable. So, if you put this in your YAML metadata,

rights: |
  Copyright 2010 So and So.

  All rights reserved

or whatever, then it should appear on the title page automatically even with the default template.

Samreay commented 3 years ago

Oh, does that support more than just text? Links, images and text formatting? Or is that rich content something that could be done with a custom template instead of using the rights metadata?

jgm commented 3 years ago

Anything you can do in pandoc's markdown will work there.

Samreay commented 3 years ago

Oh, I had no idea it was rich, my apologies then, I believe that would resolve all the issues.

Just tested it and the only thing Im not sure how to do is respect their left-hand vs right-hand page rule by explicitly setting a page break or adding a new page to align the TOC. Im guessing \newpage or \pagebreak arent the right way of going about it?

BzzBzzBzzBzz commented 8 months ago

In the PDF you linked to, it says:

  • Half title page – right-facing page (requires blank page after)
  • Title page, right-facing page
  • Copyright page
  • Dedication – right-facing page (requires blank page after)
  • Table of contents with page numbers, right-facing page (might require blank page after, depending on number of TOC pages)
  • Foreword – right-facing page (requires blank page after)

So, this suggests that the forward comes after the TOC. All that comes before is the title page, copyright info, and dedication.

You can do this in pandoc using a custom epub template. If you look at the default epub3 template, you'll see a big conditional starting $if(titlepage)$. Anything you put in this section will go before the toc. So, you could easily add your copyright info here.

In fact, the template already includes a rights variable. So, if you put this in your YAML metadata,

rights: |
  Copyright 2010 So and So.

  All rights reserved

or whatever, then it should appear on the title page automatically even with the default template.

I think that i got a similar issue and since this one is still open I would try to explain it here. If needed I could open a new one.

I tried the proposed solution, but unfortunately some reading devices (i.e. Kobo) does not support page break from CSS. Currently the only way to force a page break is to split pages in different files (see specs for details). Therefore I think that a parameter to decide where to put TOC page could be usefull.

Thanks.