michal-h21 / make4ht

Build system for tex4ht
137 stars 15 forks source link

ODT mismatch among sections, document outline, and table of contents #66

Open jmclawson opened 2 years ago

jmclawson commented 2 years ago

Expected behavior:

The document outlines and tables of contents in ODT files will match those of PDF files.

Actual behavior:

The document outlines and table of contents in ODT files are established by \section commands.

Longer description:

When converting into ODT, the resulting document map seems to be determined by the \section (and \subsection, etc) commands, without consideration for asterisked versions of these commands (\section*, etc.) and without considering commands like \addcontentsline.

Here's a MWE:

\documentclass[12pt]{article}
\usepackage{hyperref}

\begin{document}
\tableofcontents

\section{First Section}
\subsection{Subsection}
\section*{Second Section}
\section*{Third Section}\addcontentsline{toc}{section}{Third Section}
Bibliography\addcontentsline{toc}{section}{Bibliography}

\end{document}

The PDF output, with its hyperref-provided document outline on the left, looks like this: Screen Shot 2022-03-04 at 9 13 18 AM

Notice that the Second Section is correctly missing from both the Table of Contents and the document outline because it was added with \section*. Additionally, the Third Section and the Bibliography are in the contents and the document outline because these headings were added with \addcontentsline.

When converting to ODT using the bash command make4ht -f odt mwe.tex, the resulting file looks like this in Microsoft Word: Screen Shot 2022-03-04 at 9 15 54 AM

Notice that the Second Section is incorrectly included in the document outline and Bibliography is missing from it, while the TOC matches the PDF output. When I right click the TOC and choose Update Field, I get this: Screen Shot 2022-03-04 at 9 18 01 AM

Here, the TOC now matches the incorrect document outline. They both incorrectly include Section Two, which was defined using \section*, and they both incorrectly omit Bibliography, which should have been added with \addcontentsline.

P.S. It's likely I should be filing this with TeX4ht instead. I'm still learning where the division is between the two projects.

michal-h21 commented 2 years ago

Hi James,

sorry for the late reply, I somehow missed this report and I found it now.

I am afraid that this is something that is quite difficult to solve. The problem is that information in the TOC generated by TeX4ht, and information in the document outline comes from two separate sources.

The original table of contents in the ODT file comes from the TOC file. So it correctly omits \section*, including entries added using \addcontentsline.

The document outline is created automatically by Word from sections used in the document. So it includes also \section* commands, but Bibliography is missing because it is just plain text in the document. When you do Update Field, Word will regenerate TOC using this outline.

I am afraid that we cannot fix this.