jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.16k stars 3.3k forks source link

Converting a `.docx` exported from LibreOffice does not preserve quotation and preformatted text. #8938

Open StephanMeijer opened 1 year ago

StephanMeijer commented 1 year ago

Pandoc version(s)

pandoc 3.1.4
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /Users/steve/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

Examples and reproduction

Previously we created documents using Google Docs. We are using some filters and templating which you can see here.

We are running it with:

pandoc -s --quiet --from=docx --to=html \
    --output=test/<test-dir>/none/expected.html --template=src/template.html \
    test/<test-dir>/input.docx

Examples

Example input Example output (using filters as defined above)
test/libreoffice-title-subtitle-headings-image-lists-quote-preformatted-text-docx test/libreoffice-title-subtitle-headings-image-lists-quote-preformatted-text-docx/none/expected.html

Explanation on examples

Following issues are visible:

jgm commented 1 year ago

The style name for the quote is "Quotations". The style name for the preformatted text is "Preformatted Text". We can probably fix this by adding these to the lists at src/Text/Pandoc/Readers/Docx.hs l. 241-249 isCodeDiv, isBlockQuote

StephanMeijer commented 1 year ago

@jgm I really appreciate your help

StephanMeijer commented 1 year ago

@jgm updated descriptions with examples using no filters