michal-h21 / make4ht

Build system for tex4ht
137 stars 15 forks source link

Missing data when converting to docbook #27

Closed hcf-n closed 4 years ago

hcf-n commented 4 years ago

I'm converting latex to docbook. The basic metadata in \title{}, \author{} and \date{} seems to be coded as notes in the resulting xml file.

MWE: \documentclass[11pt, a4paper]{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc,url} \usepackage{textcomp} \begin{document} \title{Placeholder for title} \author{Firstname Lastname} \date{\today} \maketitle \end{document}

I convert with: make4ht --format docbook --xetex --utf8 test.tex

Best regards Hans Christian

michal-h21 commented 4 years ago

I am afraid that Docbook output is not really up-to date in TeX4ht. I personally don't know much about this format, so the output you get is in the format more than 10 years old. Some output may be in the obsolete form.

Anyway, I've fixed this issue in TeX4ht sources. The fix should be available in TeX Live 2020 soon.

The generated XML with the update will look like this:

<article
 role="report" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0">
<info
 role="maketitle">

<title
>Placeholder for title</title>
<author
><personname
><othername
>Firstname Lastname</othername></personname></author>
<date
>June 18, 2020</date>

   </info>
   </article>

Best regards, Michal

hcf-n commented 4 years ago

Thank you for your fast reply.

It is not docbook as such I'm interested in, but conversion from latex to docx.

I'm writing academic articles and books in the humanities and I rely on biblatex for references. To communicate with publishers .docx is required. I've struggled -- especially with footnotes -- using make4ht to produce .odt and then libreoffice to convert to .docx. Make4ht to docbook and then Pandoc from docbook to docx actually gives me a very decent output, i.e. accurate biblatex and \sections, \subsections, \quote \footnote and \emph working fine.

I've tried make4ht --> html. This gives a correct document in the browser, but I have not succeeded converting it to docx (the footnotes are problematic)

I'v also tried tex4ebook. This also looks promising. I've tried converting the resulting epub to .docx with Pandoc, but that also runs into problems with the footnotes.

I'm very eager to hear what you would recommend for converting to docx. Are there other parts of make4ht/tex4ebook, settings, configurations etc. I should try? I would be more than willing to contribute to testing.

Best regards, Hans Christian

michal-h21 commented 4 years ago

I think the best way in this case is to fix the .odt issues, as it is the format most similar to .docx. HTML has no built-in footnote support, so it is probably not easily possible to convert document with footnotes to docx.

What kind of issues do you have with the .odt output? I've fixed lot of issues in the past year, especially regarding some validity issues. What TeX distribution do you use?

In every case, can you please send me a test TeX file that show the ODT issues?

Best regards, Michal

hcf-n commented 4 years ago

I will give .odt another try and send you a report.

-- I'm on OS X, TeX Live 2020

Best regards, Hans Christian

hcf-n commented 4 years ago

I tried compiling the article I'm currently working on with: make4ht --xetex -uf odt --loglevel warning Test2.tex Using 'Memoir' gave som errors.

MWE:

\documentclass[11pt, a4paper,article]{memoir} \usepackage[norsk]{babel} \usepackage{fontspec} \begin{document} \title{Placeholder for title} \author{Firstname Lastname} \date{\today} \maketitle \section{Heading} Sentence.\footnote{Footnote.} \end{document}

make4ht --xetex -uf odt --loglevel warning Test2.tex

[ERROR] htlatex: Compilation errors in the htlatex run [ERROR] htlatex: Filename Line Message [ERROR] htlatex: ./Test2.tex 9 Class memoir Error: Font command \tt is not supported. [ERROR] htlatex: Compilation errors in the htlatex run [ERROR] htlatex: Filename Line Message [ERROR] htlatex: ./Test2.tex 9 Class memoir Error: Font command \tt is not supported. [ERROR] htlatex: Compilation errors in the htlatex run [ERROR] htlatex: Filename Line Message [ERROR] htlatex: ./Test2.tex 9 Class memoir Error: Font command \tt is not supported.

I have make4ht version v0.3e

Replacing 'memoir' with the standard 'article' compiles fine and gives me an .odt file that opens fine directly in Word 365 (on Mac). The resulting .odt looks very good. You must a done a lot of upgrades since last time I gave it a try!

My remaining problem is that the .odt styles does not correspond with the built in styles in LibreOffice or MS Word. What would be the reccomended way to acheive this? Is there a way to "map" the styles via a config file?

Now i get:

========= Latex Make4ht odt \section{} —> Heading-2 \subsection{} —> Heading-3 ...

What I would like to acheive would be something like:

Latex MS Word \section{} —> Heading 1 \subsection{} —> Heading 2 ...

Best regards Hans Chr

  1. jun. 2020 kl. 23:25 skrev Michal Hoftich notifications@github.com:

I think the best way in this case is to fix the .odt issues, as it is the format most similar to .docx. HTML has no built-in footnote support, so it is probably not easily possible to convert document with footnotes to docx.

What kind of issues do you have with the .odt output? I've fixed lot of issues in the past year, especially regarding some validity issues. What TeX distribution do you use?

In every case, can you please send me a test TeX file that show the ODT issues?

Best regards, Michal

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/michal-h21/make4ht/issues/27#issuecomment-646313871, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQAC5363QCRNA7XNV3A5DE3RXKA6DANCNFSM4OCCHCJQ.

michal-h21 commented 4 years ago

I can see the Memoir issue. It happens only in the ODT output, not in HTML. I've fixed it now, so it should be in TL soon. Regarding styling, you can try to use the odttemplate extension. See https://tex.stackexchange.com/a/464728/2891.

hcf-n commented 4 years ago

Thanks,

The example in the link is a bit complicated to me...

Is there a way to map the style “Heading-2” in the generated file from make4ht to the default Heading1 in Libreoffice, or is the odttemplate extension only a way to modify the style elements make4ht generates?

HC

  1. jun. 2020 kl. 13:57 skrev Michal Hoftich notifications@github.com:

I can see the Memoir issue. It happens only in the ODT output, not in HTML. I've fixed it now, so it should be in TL soon. Regarding styling, you can try to use the odttemplate extension. See https://tex.stackexchange.com/a/464728/2891 https://tex.stackexchange.com/a/464728/2891.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/michal-h21/make4ht/issues/27#issuecomment-646595749, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQAC53YYSSP6SV5JYH2CD23RXNHBZANCNFSM4OCCHCJQ.

michal-h21 commented 4 years ago

Oh, I've found that it is possible to change the style names used for sections, using:

\Configure{Heading-2}{Heading 1}
\Configure{Heading-3}{Heading 2}
hcf-n commented 4 years ago

Great, where should i put the two lines?

  1. jun. 2020 kl. 23:24 skrev Michal Hoftich notifications@github.com:

Oh, I've found that it is possible to change the style names used for sections, using:

\Configure{Heading-2}{Heading 1} \Configure{Heading-3}{Heading 2}

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/michal-h21/make4ht/issues/27#issuecomment-647776830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQAC533D6DBMHMKZAMRSCJTRX7DYVANCNFSM4OCCHCJQ.

michal-h21 commented 4 years ago

You can put them in a .cfg file. It should have the following structure:

\Preamble{xhtml}
\begin{document}
\Configure{Heading-2}{Heading 1}
\Configure{Heading-3}{Heading 2}
\EndPreamble

You can require it using the -c option:

 make4ht -c config.cfg -f odt filename.tex
hcf-n commented 4 years ago

I couldn’t get it to work at first try. Will look further into it tomorrow.

  1. jun. 2020 kl. 23:40 skrev Michal Hoftich notifications@github.com:

You can put them in a .cfg file. It should have the following structure:

\Preamble{xhtml} \begin{document} \Configure{Heading-2}{Heading 1} \Configure{Heading-3}{Heading 2} \EndPreamble You can require it using the -c option:

make4ht -c config.cfg -f odt filename.tex — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/michal-h21/make4ht/issues/27#issuecomment-647783473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQAC5373FLDWBE7WGHDHUCDRX7FVVANCNFSM4OCCHCJQ.

hcf-n commented 4 years ago

I'm not able to make this work. The resulting odt file do not recognise the styles produced by make4ht as default styles.

michal-h21 commented 4 years ago

I've tried this:

\documentclass[11pt, a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc,url}
\usepackage{textcomp}
\begin{document}
\title{Placeholder for title}
\author{Firstname Lastname}
\date{\today}
\maketitle
\section{sample}
\subsection{subsection}
\subsubsection{subsubsection}
\end{document}

And this config file, config.cfg:

\Preamble{xhtml}
\begin{document}
\Configure{Heading-2}{Heading 1}
\Configure{Heading-3}{Heading 2}
\Configure{Heading-4}{Heading 3}
\EndPreamble

Compiled using:

make4ht -f odt -c config.cfg sample.tex

The attached file seems to contain the right styles. sample.zip

hcf-n commented 4 years ago

This works on my side to! (I’m not shure why it didn’t work the first time…)

Thank you!

  1. jun. 2020 kl. 10:58 skrev Michal Hoftich notifications@github.com:

I've tried this:

\documentclass[11pt, a4paper]{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc,url} \usepackage{textcomp} \begin{document} \title{Placeholder for title} \author{Firstname Lastname} \date{\today} \maketitle \section{sample} \subsection{subsection} \subsubsection{subsubsection} \end{document}

And this config file, config.cfg:

\Preamble{xhtml} \begin{document} \Configure{Heading-2}{Heading 1} \Configure{Heading-3}{Heading 2} \Configure{Heading-4}{Heading 3} \EndPreamble

Compiled using:

make4ht -f odt -c config.cfg sample.tex The attached file seems to contain the right styles. sample.zip https://github.com/michal-h21/make4ht/files/4818290/sample.zip — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/michal-h21/make4ht/issues/27#issuecomment-648008481, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQAC532ALEPWOVOKQYUVVZLRYBVDNANCNFSM4OCCHCJQ.