jason-fox / fox.jason.passthrough.pandoc

Pandoc DITA-OT Plug-in for extending the available input formats for DITA-OT. Non DITA input sources can be pre-processed to create create valid DITA source.
https://jason-fox.github.io/dita-ot-plugins/pandoc
Apache License 2.0
20 stars 4 forks source link

Cannot convert a Word document to DITA topic #5

Closed raducoravu closed 4 years ago

raducoravu commented 5 years ago

I had success in converting Markdown to DITA using the plugin. I'm attaching a Word Document (DOCX).

Word File with various structures.docx

If I refer to it from a DITA Map:

<topicref format="pandoc" href="Word%20File%20with%20various%20structures.docx" type="topic"/>

the publishing is not able to convert it to DITA:

pandoc.process:
   [pandoc] Processing D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/markdown.md
[file-rename] Moving 1 file to D:\projects\eXml\frameworks\dita\DITA-OT3.x\plugins\fox.jason.passthrough.pandoc-master\test\input-markdown\temp\html5\oxygen_dita_temp
    [pandoc] Processing D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/Word File with various structures.docx
   [pandoc] Result: 1
   [pandoc] pandoc: File: openBinaryFile: does not exist (No such file or directory)

Running pandoc from the command line on the same Word document seems to work for me:

pandoc "D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/Word File with various structures.docx"
jason-fox commented 5 years ago

@raducoravu - does commit: 934d6f22d help you?

Specifically the extra &quot; in lines 41 and 46 of process_pandoc.xml

raducoravu commented 5 years ago

@jason-fox I confirm it works for me 👍 One small thing, somehow in the generated TOC the title of the word document which appears there contains %20 instead of spaces. Btw, as pandoc does not support ASCIIDoc conversions I recently worked on a plugin for converting ASCII Doc to DITA:

https://github.com/oxygenxml/dita-asciidoc

I liked your idea to use ANT build files as a way to do the actual conversion so what my "dita-asciidoc" plugin does is that its XMLReader implementation class actually runs an ANT build file passing to it parameters for the input and output files:

https://github.com/oxygenxml/dita-asciidoc/blob/master/com.oxygenxml.ant.parser.dita/src/com/oxygenxml/ant/dita/AntProcessReader.java

So instead of having the custom build.xml as part of the preprocessing stage, the custom build.xml is called for each conversion and is giving a parameter for the input file and a parameter for the output file. Probably your way of doing things is faster though because the processing is done for all resources from a single build file.