openstax / oer.exports

Converter to various book formats (PDF, epub, mobi)
2 stars 0 forks source link

Microbiology - Suppressed Key Terms generating '0' page number #2181

Closed jdoehnert closed 8 years ago

jdoehnert commented 8 years ago

legacy-staging4.cnx.org col10095

Suppressed Key Term entries are generating a "0" page number entry in the Index. Ideally no entry would be generated.

microbioindex

kerwinso commented 8 years ago

Jeremy's follow-up email 9/7: "All modules in that collection have key terms, except the introduction module

Here’s the first module from the chapter: https://legacy-staging4.cnx.org/content/m10445/latest/?collection=col10095/1.3"

kerwinso commented 8 years ago

Content can also be found here: http://legacy-textbook-dev.cnx.org/content/col10108/latest/

kerwinso commented 8 years ago

See #771 and #808. Key Terms were removed with #1869.

kerwinso commented 8 years ago

Alina to assign someone to try to rebuild this on staging4. If it's still a problem, we will need DevOps to look into the docbook namespace on that server.

May need to defer this to WW release.

openstaxalina commented 8 years ago

Still an issue in the PDF I just rebuilt: http://legacy-staging4.cnx.org/content/col10095/1.3/pdf

screen shot 2016-09-08 at 10 07 09 am

openstaxalina commented 8 years ago

also an issue in the PDF generated on production:

content is here: http://cnx.org/contents/ihJlaMJC

the PDF in the screenshot below is the sample on openstax.org. screen shot 2016-09-08 at 11 49 09 am

kerwinso commented 8 years ago

I was able to reproduce this bug in the Microbiology PDF on textbook-dev, which is running Docbook 1.78.1. Thus this is likely a template issue, not due to an outdated Docbook version.

Here's the result of my pdfinfo query on textbook-dev:

www-data@textbook-dev:~/files$ pdfinfo col10108-1.2.pdf
Title:          Microbiology from Production, 9-7-16
Creator:        DocBook XSL Stylesheets V1.78.1
Producer:       Prince 10 rev 6 (www.princexml.com)
Tagged:         no
Form:           none
Pages:          321
Encrypted:      no
Page size:      612 x 792 pts (letter)
Page rot:       0
File size:      64220796 bytes
Optimized:      no
PDF version:    1.4
www-data@textbook-dev:
openstaxalina commented 8 years ago

We would like to un-suppress key terms and see if the 0 page references go away.

Need to also measure the impact on page count.

kerwinso commented 8 years ago

Template: key terms will be turned back on. Markup: the key terms (everything in the glossary or term tags) need to be deleted. Note: this will impact Webview.

kerwinso commented 8 years ago

@oscryan Can you please clarify which content needs to be removed and where? Would it just be everything within the glossary or term tags?

oscryan commented 8 years ago

This is what I wanted to ask @InconceivableVizzini; is it the glossary we're removing, or something like a key terms section? If glossary, then remove both the [glossary] tag and everything inside it. If key terms section, remove the [section class="key-terms"] tag and everything inside it.

helenemccarron commented 8 years ago

addressed in PR #2224

kerwinso commented 8 years ago

Need Derek's input on Ryan's comment before this issue can be properly tested.

InconceivableVizzini commented 8 years ago

I believe the intention is that the <glossary> and everything inside is removed.

kerwinso commented 8 years ago

Verified fixed on http://legacy-textbook-dev.cnx.org/content/col10108/latest/! s6e18 201_benson_standing_in_front_of_an_explosion

Here's how I tested this:

In the old PDF, I looked at 2 different terms that were marked up as <term> in both the main content, and also inside a <glossary>: "dynein" in Ch. 3 (m17632) and "bacterial lawn" in Ch. 6 (m17649). [Hey kids, get off my bacterial lawn!]

Then, for both chapters 3 and 6, I removed all markup starting and ending with the <glossary> tags for every module in that chapter.

The result: each term I tested only showed up once in the index, properly, linking back to the reference in the main content. No page 0 reference. NOTE: index terms that only had a single page 0 reference should disappear completely from the index after removing the <glossary> markup.

image

And the EOC content looks good, the same as before when we were suppressing the Key Terms in the template.

image

Now someone just needs to go back and remove the <glossary> content from all the modules.