jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.72k stars 3.39k forks source link

docbook to pdf (latex) conversion produces two copies of the abstract #8815

Closed varkappadev closed 1 year ago

varkappadev commented 1 year ago

affected version: 3.1.2

M(N)WE:

Starting with the following document

= My Title
Me 

[abstract]
--
Brief summary...
--

and converting to docbook using

asciidoctor -b docbook5 "test.adoc" -o "test.docbook"

results in a document with the following snippet (non-relevant parts removed)

<abstract>
<simpara>Brief summary&#8230;&#8203;</simpara>
</abstract>

Converting this to pdf

pandoc -t pdf -f docbook "test.docbook" -o "test.pdf"

or to latex for debugging:

pandoc -t latex -f docbook "test.docbook" -s -o "test.tex"

results in a document with the abstract included twice, such as (for the latex output)

\begin{abstract}
Brief summary\ldots\hspace{0pt}
\end{abstract}

\begin{quote}
Brief summary\ldots\hspace{0pt}
\end{quote}

I would have expected only the \begin{abstract}...\end{abstract} part.

pandoc 2.17.1.1 on Debian does not include the abstract environment but does include the quote one. I am aware this is a large version difference and does not narrow it down that much but may be helpful nonetheless.

This may be related to work addressing metadata processing such as #7747.

jgm commented 1 year ago

Here's the complete docbook:

<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>My Title</title>
<date>2023-05-01</date>
<author>
<personname>
<firstname>Me</firstname>
</personname>
</author>
<authorinitials>M</authorinitials>
</info>
<abstract>
<simpara>Brief summary&#8230;&#8203;</simpara>
</abstract>
</article>
jgm commented 1 year ago

Is the docbook produced by asciidoctor correct? See https://tdg.docbook.org/tdg/5.1/abstract.html

These elements contain abstract: biblioentry, bibliomixed, bibliomset, biblioset, info (db.info), info (db.titleforbidden.info), info (db.titleonly.info), info (db.titleonlyreq.info), info (db.titlereq.info), merge.

It looks like it isn't valid to put an abstract element directly under article; it should instead go under info, and that is where pandoc expects it. If you move it there, the problem goes away.

So far as I can see, then, this is a bug in asciidoctor.

varkappadev commented 1 year ago

You are right @jgm, this is a bug in asciidoctor (anyone else who runs into this, see https://github.com/asciidoctor/asciidoctor/issues/3602). I didn't notice this is not supported in docbook5.

Thanks!