Closed StephanMeijer closed 1 year ago
You're converting from what to what?
What do you mean by "shape format"?
From Docx to HTML.
Shape Format is some Microsoft Word feature allowing user for freely positioning text, using Word art, positioning images, among others. A feature that probably shouldn't be used.
Currently working on a PR for Pandoc to investigate and if possible fix.
This would probably require some extreme measures in src/Text/Pandoc/Readers/Docx/Parse.hs
as logic has to be changed: A w:p
can also contain more paragraphs, not only runs..
More info can be found in ECMA-376 Part 1
Closed by #9223 - I accidentally hit enter before finishing the description of the squashed commit.
@jgm many thanks for merging! I will publish some test-cases and possible fixes on my code for VML-based images probably tomorrow to make sure those are still supported within context of shape format.
Explain the problem.
Text in Shape Format is not extracted
Example:
Screenshot
document.xml
```xmlMsWord.docx
Pandoc version: 3.1.9