jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.04k stars 3.35k forks source link

Unwanted block quotes in lists when converting ODT generated from DOCX #9505

Open mathrick opened 7 months ago

mathrick commented 7 months ago

Explain the problem. This might be related to #8836. It seems that the fixes to native DOCX reader were not propagated to other affected formats like ODT, even though that issue does mention .odt in a comment.

The file in question has been generated by LibreOffice (using official Ubuntu 22.0 packages) from DOCX (same source file as in #9504): How-to-Become-a-Best-Selling-Author.odt

$ libreoffice --convert-to odt How-to-Become-a-Best-Selling-Author.docx
$ libreoffice --version
LibreOffice 7.3.7.2 30(Build:2)
$ pandoc How-to-Become-a-Best-Selling-Author.odt -o How-to-Become-a-Best-Selling-Author.typ

This generates the following list structure:

In this article you will learn the basics about three things:

- #quote(block: true)[
  Defining your target audience
  ]

- #quote(block: true)[
  Creating book covers, titles, and subtitles
  ]

- #quote(block: true)[
  Working with other authors
  ]

Pandoc version?

$ pandoc --version
pandoc 3.1.12.1
Features: +server +lua
Scripting engine: Lua 5.4
jgm commented 6 months ago

This has nothing to do with typst specifically; it's an odt reader issue.

mathrick commented 6 months ago

@jgm: Ah, I wasn't sure about that, since I couldn't immediately see the blockquotes in the native format dump, and it doesn't generate them with Markdown target.

jgm commented 6 months ago

It's there in the native output:

, BulletList
    [ [ BlockQuote
          [ Para
              [ Str "Defining"
              , Space
              , Str "your"
              , Space
              , Str "target"
              , Space
              , Str "audience"
              ]
          ]
      ]
    , [ BlockQuote
          [ Para
              [ Str "Creating"
              , Space
              , Str "book"
              , Space
              , Str "covers,"
              , Space
              , Str "titles,"
              , Space
              , Str "and"
              , Space
              , Str "subtitles"
              ]
          ]
      ]
    , [ BlockQuote
          [ Para
              [ Str "Working"
              , Space
              , Str "with"
              , Space
              , Str "other"
              , Space
              , Str "authors"
              ]
          ]
      ]
    ]
jgm commented 6 months ago

I think this is happening because the list items' paragraphs are indented. The ODT reader uses indentation as a heuristic for determining when we have a block quote. It is a similar issue to #8836, but the fix will have to be different because ODT is a different format.