michal-h21 / tex4ebook

Converter from LaTeX to ebook formats (epub, mobi). Using tex4ht and texlua scripts.
316 stars 33 forks source link

No index when using -f epub3 #33

Open wochristian opened 7 years ago

wochristian commented 7 years ago

Tex4ebook does not produce an index when using the -f epub3 option. The index is produced correctly when generating epub2.

michal-h21 commented 7 years ago

I need a sample document for this. Do you use some custom configurations?

wochristian commented 7 years ago

Hi Michal—

Thanks for responding. I have uploaded a zip file that has my book example.

https://www.dropbox.com/sh/akidyrddv08gnk1/AACes1tHmRxP4cGdIl0zfTDWa?dl=0

The main file is: SimpleEPub.tex The config file is: config.cfg I compile the EPubs without getting any errors using the shell script: epub.sh

tex4ebook SimpleEPub.tex -t -c config.cfg tex4ebook SimpleEPub.tex -f epub3 -t -c config.cfg

The first command produces EPub 2 with an index. The second command produces EPub 3 without an index.

Although both EPub files display properly in the iBook Reader from Apple, they do not pass the IDPF EPub validator test. Also, if I add the mathml option to the epub3 command, the math type setting fails for some equations.

In case you are interested, we are trying to produce a new CC edition of our computational physics text. A PDF of the current edition is here:

http://www.compadre.org/osp/items/detail.cfm?ID=7375

Wolfgang

PS: The LaTex files compile on an iMac using TeXStudio set up for lualatex without errors.

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Thursday, March 30, 2017 at 7:22 AM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

I need a sample document for this. Do you use some custom configurations?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michal-h21/tex4ebook/issues/33#issuecomment-290381989, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3lyWI20BsyFQ9GeWss8shMAtYF6OSCks5rq5B1gaJpZM4MseVE.

michal-h21 commented 7 years ago

Hi Wolfgang, sorry for a late reply. I took a quick look at your file and got stuck with math issues.

First of all, one error in tex4ht I discovered today is present also in your document - use of \mathbf causes wrong conversion of all subsequent math environments.

Another issue is with reflist environment, which you use for bibliography. It produces invalid xhtml code.

Simple fix for these two issues is here:

\Preamble{xhtml}
\begin{document}
\Css{h1 { color : red; }}  
\Css{h2 { color : \#900000; }}  
\Css{h3 { color : \#900000; }}  
\Css{h4 { color : \#900000; }}  
\Configure{mathbf}{\HCode{<mi  mathvariant="bold">}\PauseMathClass}{\EndPauseMathClass\HCode{</mi>}}
\ConfigureList{reflist}{\HCode{<div class="bibitem">}\par}{\HCode{</div>}}{\ifvmode\IgnorePar\fi\EndP\HCode{</div><div class="bibitem">}\par}{}
\ConfigureEnv{reflist}{\ifvmode\IgnorePar\fi\EndP\HCode{<div class="bibliography">}}{\ifvmode\IgnorePar\fi\EndP\HCode{</div>}}{}{}
\EndPreamble

Some issues with math if caused with missing \Big or \big before brackets.

I will give you more information later, I have to go now.

wochristian commented 7 years ago

Michal –

Thank you again for find these bugs and suggesting changes. I am traveling for the next week and will implement them when I get home on April 20.

Yours,

Wolfgang

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Wednesday, April 5, 2017 at 8:52 AM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

I will give you more information later, I have to go now.

michal-h21 commented 7 years ago

Wolfgang, it is always nice to have some real life documents for test. I found some bugs in tex4ht thanks to your document and I am slowly working on fixing that. These are mainly deprecated attributes in HTML5, which I don't think it is really critical, but there are also some duplicate id attribute, which will be more tricky to identify.

The mathml issues seems to come mainly from wrong conversion of delimiters (like $ (a + b ] - c)$ ) to mathml. I don't really understand your math, so I don't know whether the used markup is correct or not - tex4ht is sometimes more picky about the input than LaTeX itself.

Anyway, you can try to use mathml- option instead of mathml - it seems that it fixes all validating issues, but I am not sure whether the resulting rendering is correct!

And back to your original issue, the missing index. Epub3 requires special formatting of the table of contents so tex4ebook uses special configuration, which doesn't include starred sectioning commands. intoc option doesn't seem to work in this case, so you may need to use explicit

  \addcontentsline{toc}{chapter}{Index}

below \printindex. I think that index will need some more customizations, because it contains page numbers, which doesn't have a sense in e-book. You can take a look here: http://tex.stackexchange.com/a/348149/2891 for more information

michal-h21 commented 7 years ago

I've found an issue in the previous configuration for reflist, here is fixed version:

\Preamble{xhtml}
\begin{document}
\Css{h1 { color : red; }}  
\Css{h2 { color : \#900000; }}  
\Css{h3 { color : \#900000; }}  
\Css{h4 { color : \#900000; }}  
\Configure{mathbf}{\HCode{<mi  mathvariant="bold">}\PauseMathClass}{\EndPauseMathClass\HCode{</mi>}}
\ConfigureList{reflist}{\HCode{<div class="bibitem">}\par}{\ifvmode\IgnorePar\fi\EndP\HCode{</div>}}{\ifvmode\IgnorePar\fi\EndP\HCode{</div><div class="bibitem">}\par}{}
\ConfigureEnv{reflist}{\ifvmode\IgnorePar\fi\EndP\HCode{<div class="bibliography">}}{\ifvmode\IgnorePar\fi\EndP\HCode{</div>}}{}{}
\EndPreamble
michal-h21 commented 7 years ago

I am still investigating on the wrong math issue. One example which produces wrong math is the following code:

\begin{equation}
\label{eq:motion/dipole}
\Bv = \frac{ \mu_0 m}{4 \pi \epsilon_0r^3}[3 \hat p \cdot \hat r) \hat r -
\hat {p}].
\end{equation}
\end{document}

Isn't left parenthesis missing here? Shouldn't it be

\begin{equation}
\label{eq:motion/dipole}
\Bv= \frac{ \mu_0 m}{4 \pi \epsilon_0r^3}[(3 \hat p \cdot \hat r) \hat r -
\hat {p}].
\end{equation}

???

Another minor issue, \bf command is deprecated in LaTeX, so I think \Bv command should be defined as

\newcommand{\Bv}{\mathbf{B}}

instead of

\newcommand{\Bv}{{\bf B}}

It is rendered in italics instead of bold in the current form.

wochristian commented 7 years ago

Dear Michal –

I have changed the config file as you suggested and I have fixed the LaTeX math errors that you reported. (Thanks.) I have placed my new source files and two generated EPub3 documents into a shared drop box folder.

https://www.dropbox.com/sh/akidyrddv08gnk1/AACes1tHmRxP4cGdIl0zfTDWa?dl=0

One EPub3 was generated with the mathml option and the other did not have that option.

tex4ebook SimpleEPub.tex mathml -f epub3 -t -c config.cfg tex4ebook SimpleEPub.tex -f epub3 -t -c config.cfg

The text and math rendering in the EPubs look good although neither EPub3 passes the IDPF validation.

There is one very odd bug. The EPUB without the mathml option shows the Preface at the beginning where it should be. However, adding the mathml option puts the Preface at the end of the book! I would prefer to use mathml.

Yours,

Wolfgang

PS: Changing the mathml option to mathml- seems to be equivalent to not have the option. In other words, the mathml- option uses png images to show the math.

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Tuesday, April 11, 2017 at 11:47 AM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

I've found an issue in the previous configuration for reflist, here is fixed version:

\Preamble{xhtml}

\begin{document}

\Css{h1 { color : red; }}

\Css{h2 { color : #900000; }}

\Css{h3 { color : #900000; }}

\Css{h4 { color : #900000; }}

\Configure{mathbf}{\HCode{}\PauseMathClass}{\EndPauseMathClass\HCode{}}

\ConfigureList{reflist}{\HCode{

}\par}{\ifvmode\IgnorePar\fi\EndP\HCode{
}}{\ifvmode\IgnorePar\fi\EndP\HCode{
}\par}{}

\ConfigureEnv{reflist}{\ifvmode\IgnorePar\fi\EndP\HCode{

}}{\ifvmode\IgnorePar\fi\EndP\HCode{
}}{}{}

\EndPreamble

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michal-h21/tex4ebook/issues/33#issuecomment-293306472, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3lycj0IRWl_2CUZK_xjKV1fFcKe7FHks5ru6CRgaJpZM4MseVE.

michal-h21 commented 7 years ago

Dear Wolfgang,

the mathml- option needs to be used together with mathml option, like

 tex4ebook -t -c config.cfg -f epub3 SimpleEPub "mathml,mathml-"

sorry for the confusion. But I've found that the downside of this option is that the size of the braces doesn't match the size of the contained elements, so it seems that fixing the issues in LaTeX code is preferable solution. tex4ht is much more sensible to the input than normal LaTeX, because it needs to produce semantically correct output. In particular, if one uses uneven number of brackets in the math, the only solution seems to use \left and \right. So the following nonsensical sample:

$a=b)$ 

becomes

$a= \left.b\right)$

Only in this way we can generate correct mathml for the above code.

Regarding your updated files, there is much less errors. I've found another one math issue:

 velocity is $\mathbf{v_0}$. (b) The gravitational and drag forces on a 

this needs to be written as

 velocity is ${\mathbf{v}}_{\mathbf{0}}$. (b) The gravitational and drag forces on a 

in order to generate the correct mathml.

The other error which can be easily fixed is description environment in text1.tex file starting on line 514 - the \item headers should be in [] brackets. So instead of

   \item \emph{Introduction}.

use

   \item[\emph{Introduction}.]

The other issues seems to come from tex4ht side and I will investigate them later.

Best regards, Michal

michal-h21 commented 7 years ago

Another issue is with duplicated id attributes. This error causes epubcheck to report lot of issues about missing link destination.

I've found that this issue was caused by listings package and fixed it in the tex4ht sources. But this change will be present only in TeX Live 2017, so you put the following lines to the .cfg file in the meantime:

\makeatletter
\catcode`\:=11                              
\def\lst@makecaption#1#2{\cptA: #1\cptB: \cptC: #2\cptD:}                                                
\catcode`\:=12 
\makeatother 
wochristian commented 7 years ago

Dear Michal –

The EPub 3 is looking better and better. There are still numerous errors of the type: “MathML should either have an alt text attribute or annotation-xml child element.” Adding alt text that contains “math equation” would fix the warnings but a better solution might be to use the original LaTeX math markup as the alt text since a lot of our readers will understand LaTeX.

One issue that I do not understand is why the Preface appears at the end of the EPub 3 when I use the mathml or the "mathml,mathml-" option but not if I remove that option to render equations as images. Why does using mathml force the preface to the end of the EPub?

Best, Wolfgang

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Thursday, April 20, 2017 at 4:00 PM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

Another issue is with duplicated id attributes. This error causes epubcheck to report lot of issues about missing link destination.

I've found that this issue was caused by listings package and fixed it in the tex4ht sources. But this change will be present only in TeX Live 2017, so you put the following lines to the .cfg file in the meantime:

\makeatletter

\catcode`\:=11

\def\lst@makecaption#1#2{\cptA: #1\cptB: \cptC: #2\cptD:}

\catcode`\:=12

\makeatother

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michal-h21/tex4ebook/issues/33#issuecomment-295883331, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3lyb497fMvfw-H9DmitDLP_I4Wl2fBks5rx7lmgaJpZM4MseVE.

michal-h21 commented 7 years ago

Dear Wolfgang,

the missing alt attributes for MathML should be only warnings, at least Epubcheck doesn't report it as an error. It wouldn't be easy if we really want to fix this issue: we would to catch contents of each math instance and typeset it twice: first time in verbatim to get the TeX code in the alt attribute, second time to get the actual MathML. It is not so hard to do this for inline math, but it gets more difficult for display math and various aligned environments.

Regarding the issue with preface, I can't reproduce this issue. Could you send me the copies of content.opf file produced with and without mathml option?

wochristian commented 7 years ago

Dear Michael –

Here are the two opf files. I have also attached a screen shot showing that the preface is the first entry in the Table of Contents but that the file appears at page 169.

The missing alt attribute is Warning but it would be nice to get rid of them. Can you just create a dummy alt attribute that says “Math ML not supported” or something similar?

Wolfgang

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Wednesday, April 26, 2017 at 3:39 PM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

Dear Wolfgang,

the missing alt attributes for MathML should be only warnings, at least Epubcheck doesn't report it as an error. It wouldn't be easy if we really want to fix this issue: we would to catch contents of each math instance and typeset it twice: first time in verbatim to get the TeX code in the alt attribute, second time to get the actual MathML. It is not so hard to do this for inline math, but it gets more difficult for display math and various aligned environments.

Regarding the issue with preface, I can't reproduce this issue. Could you send me the copies of content.opf file produced with and without mathml option?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michal-h21/tex4ebook/issues/33#issuecomment-297518940, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3lyYNlDqJsU-PRjlto4kFmjV5h7xjAks5rz51zgaJpZM4MseVE.

michal-h21 commented 7 years ago

I think that rest of the validation issues can be fixed with following Make4ht build file. Save it as SimpleEPub.mk4:


local filter= require "make4ht-filter"

if mode == "draft" then
  Make:htlatex {}
else
  Make:htlatex {}
  Make:htlatex {}
  Make:htlatex {}
end

local addaltmath = function(s)
  return s:gsub('"http://www.w3.org/1998/Math/MathML"', '"http://www.w3.org/1998/Math/MathML" alttext="Math content"')
end
local removerules = function(s)
  return s:gsub('rules="groups"', '')
end

local process = filter {
  addaltmath, removerules
}

Make:match("html$", process)

It simply removes rules="groups" from tables, because that attribute is invalid in HTML5. It also add alttext="Math content" to each math instance. I can get valid Epub3 with these changes.

BTW, it seems that the OPF files you send in your previous post were stripped by Github, I can't see them.

michal-h21 commented 7 years ago

I've found an easier way how to solve the rules="groups" issue, modify the configuration of halignTB<> in the .cfg file:

\Configure{halignTB<>}{tabular}{\HCode{id="TBL-\TableNo"  class="tabular"\Hnewline 
}<>\HAlign}
wochristian commented 7 years ago

Michal—

Here the OPF files that I sent yesterday in a zip archive. I have not yet tried the fix that you suggested.

Wolfgang

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Thursday, April 27, 2017 at 5:07 AM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

BTW, it seems that the OPF files you send in your previous post were stripped by Github, I can't see them.

michal-h21 commented 7 years ago

Wolfgang,

I am sorry, but I still can't see the zip file. Maybe it could be best if you put it on your Dropbox.

Michal

wochristian commented 7 years ago

Here is a link to the Dropbox file:

https://www.dropbox.com/s/obxnulu8knv85zo/opfFiles.zip?dl=0

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook

Wolfgang,

I am sorry, but I still can't see the zip file. Maybe it could be best if you put it on your Dropbox.

Michal

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michal-h21/tex4ebook/issues/33#issuecomment-297694473, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3lyZL52LEijAQLmYdPqXbN22un69baks5r0IR5gaJpZM4MseVE.

michal-h21 commented 7 years ago

I can see that now. It is some issue fixed in the development version of tex4ebook, which is not on CTAN yet. I plan to make new release this week, which will be used in TL 2017.

wochristian commented 7 years ago

Thanks. Although I have done some computational physics programming, I am not a skilled software developer and I have greatly benefited from your help. I’ll look forward to grading to the new packages and to TL 2017 when it becomes available.

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Thursday, April 27, 2017 at 8:40 AM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

I can see that now. It is some issue fixed in the development version of tex4ebook, which is not on CTAN yet. I plan to make new release this week, which will be used in TL 2017.

michal-h21 commented 7 years ago

I've hopefully fixed the rules="groups" issue in tex4ht. I also fixed many other issues and bugs in tex4ht thanks to your real-world samples, so I should thank you for providing all this input.

wochristian commented 7 years ago

Michael –

The patches that you sent me today fixed all the validation Errors and Warnings and I now get a clean EPub 3 with Math ML when using the IDPF validator. Hurray!

There is a layout issue with equation numbers that seems relatively minor but might be fixable. When using \begin{equation} the equation numbers are aligned on the right hand margin as they should be. But when using \begin{subequations} the equation numbers appear after the equation and this looks ragged. The TeXStudio pdf output doesn’t do this so I assume that the problem is in the html conversion.

Wolfgang

From: Michal Hoftich notifications@github.com Reply-To: michal-h21/tex4ebook reply@reply.github.com Date: Thursday, April 27, 2017 at 5:07 AM To: michal-h21/tex4ebook tex4ebook@noreply.github.com Cc: wc wochristian@davidson.edu, Author author@noreply.github.com Subject: Re: [michal-h21/tex4ebook] No index when using -f epub3 (#33)

I think that rest of the validation issues can be fixed with following Make4ht build file. Save it as SimpleEPub.mk4:

local filter= require "make4ht-filter"

if mode == "draft" then

Make:htlatex {}

else

Make:htlatex {}

Make:htlatex {}

Make:htlatex {}

end

local addaltmath = function(s)

return s:gsub('"http://www.w3.org/1998/Math/MathML"', '"http://www.w3.org/1998/Math/MathML" alttext="Math content"')

end

local removerules = function(s)

return s:gsub('rules="groups"', '')

end

local process = filter {

addaltmath, removerules

}

Make:match("html$", process)

It simply removes rules="groups" from tables, because that attribute is invalid in HTML5. It also add alttext="Math content" to each math instance. I can get valid Epub3 with these changes.

BTW, it seems that the OPF files you send in your previous post were stripped by Github, I can't see them.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michal-h21/tex4ebook/issues/33#issuecomment-297657860, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3lyWkgUcpS3edO4_gQNeEHRunwcOzSks5r0FrtgaJpZM4MseVE.