standardebooks / tools

The Standard Ebooks toolset for producing our ebook files.
Other
1.42k stars 125 forks source link

print-toc not including subtitles #251

Closed acabal closed 4 years ago

acabal commented 4 years ago

@drgrigg , se lint is complaining about mismatched chapter headers in the ToC of https://github.com/standardebooks/william-carlos-williams_poetry

It looks like se print-toc is generating a ToC for two of the poems without subtitles, but those subtitles should be there, unless I'm missing something. The two items are #january-morning and #the-wanderer. Maybe it's because they're in <article> elements?

This seems to be the only ebook where this is occurring. Can you take a look?

Also, make sure to pull the latest commit for the tools, as I've been updating a lot of things the past few days.

drgrigg commented 4 years ago

OK, looking into it. It would be really nice if there was a way to run in debugging mode with the se code, but I can't figure out how to do that, so I have to work with a kind of bootstrapped module of just the print-toc code. That way I can single step through the code to see what it's doing.

drgrigg commented 4 years ago

The code is correctly picking up the subtitle, just not outputting it the way we want.

It’s fallen foul of that rule we’ve had trouble with in the past, the rule which says: If the title isn’t a roman numeral, don’t include the subtitle unless you’re the heading of a PART, a DIVISION or a VOLUME This rule avoids the inclusion of subtitles in prefaces or epilogues.

In this case, it’s in a BookDivision.ARTICLE. We can extend the rule to output subtitles for headings within Articles, if that won't mess up anything else?

The simplest fix is to change line 82 in se_epub_generate_toc.py to read:

if self.subtitle != "" and (self.division in [BookDivision.PART, BookDivision.DIVISION, BookDivision.VOLUME, BookDivision.ARTICLE]):

or it might be better to refactor the whole else block there to:

if self.subtitle != "" and (self.division in [BookDivision.CHAPTER, BookDivision.SUBCHAPTER]):
    out_string += f"<a href=\"text/{self.file_link}\">{self.title}</a>\n"  # omit the subtitle
else: 
    out_string += f"<a href=\"text/{self.file_link}\">{self.title}: {self.subtitle}</a>\n"   # incude the subtitle

I can do this as a pull request. But it should be tested against the corpus to see that it won't mess up any other books which include article sections. I don't really have an automated way of doing that.

drgrigg commented 4 years ago

Actually no, that would introduce a bug. Should be :

if self.subtitle == "" or (self.division in [BookDivision.CHAPTER, BookDivision.SUBCHAPTER]):
    out_string += f"<a href=\"text/{self.file_link}\">{self.title}</a>\n"  # omit subtitle
else:
    out_string += f"<a href=\"text/{self.file_link}\">{self.title}: {self.subtitle}</a>\n"  # include subtitle
acabal commented 4 years ago

That isn't quite right either, because now it's adding subtitles to prologues/epilogues/half titles, etc., which we don't want. For example Moonstone and His Last Bow both get a wrong ToC from this change, and lint complains.

Can we just adjust it so that it treats <article> the same way it treats <section>? In either case, none of those <article>s in William Carlos Williams are marked as a "part" or "division".

drgrigg commented 4 years ago

Ok, I'll be able to look at this properly this evening (10 hrs from now), and will run tests against the projects I know to have been troublesome in the past.

drgrigg commented 4 years ago

I did find a little time this morning to look at this. We have a fundamental conflict, I think.

Here's a heading from a short story collection. We DON'T want the subtitle to appear in the ToC.

<article id="his-last-bow" epub:type="volume se:short-story">
    <h2 epub:type="title">
        <span>His Last Bow</span>
        <span epub:type="subtitle">An Epilogue of Sherlock Holmes</span>
    </h2>

Here's a heading from a poetry collection. We DO want the subtitle to appear in the ToC.

<article id="january-morning" epub:type="z3998:poem">
    <h2 epub:type="title">
        <span>January Morning</span>
        <span epub:type="subtitle">Suite</span>
    </h2>

But these two headings are identical in structure except for the epub:type! I guess I can check for whether the epub:type is poem or short-story, but it seems awfully messy!

acabal commented 4 years ago

I think we do want the subtitle in Holmes; its not an actual epilogue, that's just the subtitle to the story. So in that sense the output should be the same as it was before. Really, print-toc works in all other cases besides these exceptions. We have to figure out what makes these two exceptions different. It is the fact that they're articles and not sections?

On February 22, 2020 6:25:11 PM CST, David Grigg notifications@github.com wrote:

I did find a little time this morning to look at this. We have a fundamental conflict, I think.

Here's a heading from a short story collection. We DON'T want the subtitle to appear in the ToC.

<article id="his-last-bow" epub:type="volume se:short-story">
  <h2 epub:type="title">
      <span>His Last Bow</span>
      <span epub:type="subtitle">An Epilogue of Sherlock Holmes</span>
  </h2>

Here's a heading from a poetry collection. We DO want the subtitle to appear in the ToC.

<article id="january-morning" epub:type="z3998:poem">
  <h2 epub:type="title">
      <span>January Morning</span>
      <span epub:type="subtitle">Suite</span>
  </h2>

But these two headings are identical in structure except for the epub:type! I guess I can check for whether the epub:type is poem or short-story, but it seems awfully messy!

drgrigg commented 4 years ago

But lint complains if the subtitle IS included in His Last Bow. I thought the rule was "No subtitle unless the chapter title is a pure Roman numeral"

drgrigg commented 4 years ago

"Except if it's a Part or Book or Division" etc...

drgrigg commented 4 years ago

I guess the most sensible thing is for me to look at lint and figure out what it's expecting. I'll do that this evening.

acabal commented 4 years ago

OK, I think I figured it out. Turns out print-toc was correct. The problem was, the epub had an extra <section> wrapping all of the poems, that was marked as a volume. This was not necessary. Removing it makes the ToC output correctly, with subtitles.

Since this is the only error like this in the corpus, I think that was the problem. Sorry to have made you dig around :(