jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.84k stars 3.33k forks source link

Docx writer breaks list item when it contains display math #6638

Open lanbones opened 4 years ago

lanbones commented 4 years ago

Hello,

We are using pandoc 2.10.1 to translate exam paper from markdown to docx and latex. The problem occurs when markdown content is like this,

1. Display math:
$$x = x + 1$$
rest content for question 1...

2. other content...

After translation, the pdf looks well but docx writer adds a list number before 'rest content for question 1...'

In pdf, image

But in docx, image

The commands are simple

pandoc in.md -o out.docx
pandoc in.md -o out.pdf
lanbones commented 4 years ago

I found the root cause is that fixDisplayMath changes the paragraph into Div, then docx writer treat them as separated paragraphs. I have tried to change it into one Para in which separated content is concatenated with LineBreak, and then it works. Is Div really necessary for display math?

jgm commented 4 years ago

The issue isn't really specific to math. It arises with this too (& this is the underlying cause of the issue with display math):

1. ::: {.math}
   Display math

   and more
   :::

2. another item
lanbones commented 4 years ago

I think our cases are different. In your case, your manually construct a div section which is not a concept in docx (I am not sure...). see

<ol type="1">
<li><div class="math">
<p>Display math</p>
<p>and more</p>
</div></li>
<li>another item</li>
</ol>

However, in my case, the AST doesn't contains a Div. But fixDisplayMath does a bad adjustment for the structure. (I give a html version here, because json is too long after format)

<ol type="1">
<li><p>Display math: <span class="math display"><em>x</em> = <em>x</em> + 1</span> rest content for question 1…</p></li>
<li><p>other content…</p></li>
</ol>
{"blocks":[{"t":"OrderedList","c":[[1,{"t":"Decimal"},{"t":"Period"}],[[{"t":"Para","c":[{"t":"Str","c":"Display"},{"t":"Space"},{"t":"Str","c":"math:"},{"t":"SoftBreak"},{"t":"Math","c":[{"t":"DisplayMath"},"x = x + 1"]},{"t":"SoftBreak"},{"t":"Str","c":"rest"},{"t":"Space"},{"t":"Str","c":"content"},{"t":"Space"},{"t":"Str","c":"for"},{"t":"Space"},{"t":"Str","c":"question"},{"t":"Space"},{"t":"Str","c":"1…"}]}],[{"t":"Para","c":[{"t":"Str","c":"other"},{"t":"Space"},{"t":"Str","c":"content…"}]}]]]}],"pandoc-api-version":[1,21],"meta":{}}

I think perfectly handle Div in list is hard for docx. (At least, I don't know how to calculate the indentation for non-first paragraph inside Div). But is it possible to make fixDisplayMath give a better adjustment?

lanbones commented 4 years ago

In my experiment, this change works well for my case on docx writer. But I am not sure whether it would cause other bugs.

diff --git a/src/Text/Pandoc/Writers/Shared.hs b/src/Text/Pandoc/Writers/Shared.hs
index b399afbf3..392ae98e3 100644
--- a/src/Text/Pandoc/Writers/Shared.hs
+++ b/src/Text/Pandoc/Writers/Shared.hs
@@ -180,17 +180,18 @@ fixDisplayMath :: Block -> Block
 fixDisplayMath (Plain lst)
   | any isDisplayMath lst && not (all isDisplayMath lst) =
     -- chop into several paragraphs so each displaymath is its own
-    Div ("",["math"],[]) $
-       map Plain $
+    Plain $
+       concat . (intersperse [LineBreak]) $
        filter (not . null) $
        map stripLeadingTrailingSpace $
        groupBy (\x y -> (isDisplayMath x && isDisplayMath y) ||
                          not (isDisplayMath x || isDisplayMath y)) lst
 fixDisplayMath (Para lst)
   | any isDisplayMath lst && not (all isDisplayMath lst) =
     -- chop into several paragraphs so each displaymath is its own
-    Div ("",["math"],[]) $
-       map Para $
+    Para $
+       concat . (intersperse [LineBreak]) $
        filter (not . null) $
        map stripLeadingTrailingSpace $
        groupBy (\x y -> (isDisplayMath x && isDisplayMath y) ||
jgm commented 4 years ago

Well, fixDisplayMath creates the kind of structure I illustrate above. My point is that the docx writer doesn't handle this structure well -- which I'd regard as a bug. If this bug were fixed, it would fix your issue too without changes in fixDisplayMath.

lanbones commented 4 years ago

Great, thank you very much. It is a hard but better choice to fix it in your way :)