jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.87k stars 3.39k forks source link

Support {INCLUDEPICTURE "img.png"} in docx #4492

Open N0rbert opened 6 years ago

N0rbert commented 6 years ago

I have 15 year experience working with MS Word. And sometimes I add semi-LaTeX functionality to it by using fields. Now I'm trying to switch to RMarkdown (which is pandoc-driven).

One of such methods is to include graphics/images/photos/figures by link.
Many non-LaTeX publishers require authors to have figures linked (not embedded in docx-document). This method decreases docx-filesize. In HTML it is <img src="img.png">. Word equivalent for this is the Ctrl+F9 field with {INCLUDEPICTURE "img.png"} text. It was a standard for Word 2003, and is supported on current Word 2016 version.

If it is possible please add configuration option to command-line and YAML header to trigger such behavior. It should be boolean:

jkr commented 6 years ago

I doubt this will be implemented any time soon. But you can actually take care of this yourself with a lua filter that would replace images with raw xml. Something like the following:

function Image(elt)
   local rawxml = [[<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> INCLUDEPICTURE "]] ..
      elt.src ..
      [["</w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>]]
   if (FORMAT=="docx") then
      return pandoc.RawInline("openxml", rawxml)
   end
end

Not totally sure what you want the end result to look like. But hopefully that should be able to get you started. If you upload a (minimal) working document, we could probably help clarify further.

N0rbert commented 6 years ago

@jkr thank you very much! This lua-filter does exactly what I want. Just tested with pandoc 2.1.3 on real document with 90 external images from img subfolder. Pandoc's extensibility is great!

jkr commented 6 years ago

Great! Closing, in that case.

sfadschm commented 3 years ago

Thanks for re-opening @jgm! I want to add another use case for adding support for IncludePicture. In my case, I usually create .docx reports where images are linked just like Norbert described. Now I want to use pandoc to convert the docx file to a different format (i.e., markdown in my specific case). By doing so, all linked images are simply skipped and do not produce any output. I would be nice to have them rendered in the same way that normal (embedded) images are handled:

// embedded image
![](media/image1.emf){width="3.0in" height="2.25in"}

// linked image
// doxc: { INCLUDEPICTURE "images/Decays.emf" \d }
![](images/Decays.emf){width="4.9in" height="3.78n"}
jgm commented 3 years ago

OK, so your use case is for the docx reader (not writer). I'll add that tag. @jkr would it be parse to turn these things in docx files into regular Image elements?

sfadschm commented 3 years ago

@jgm Yes, correct. Should I extract that to a new request, as the original request here is about the docx writer? Unfortunately, I'm not familiar with Haskell but from what I see in the docx reader it seems parsing this should not be too different from the already existing parsing from reference fields. In the end, the IncludePicture field only contains a text link to an image in a folder.

jgm commented 3 years ago

Let's leave this open; I've changed the title to make it more general. As far as the writer goes, we may want to leave things as they are (allowing use of a filter as described above), though we could also consider extending the use of the data-external attribute to this case.

sfadschm commented 3 years ago

Agreed. Regarding the original request I am with N0bert, that a new command line option for docx export might be the smoothest solution. Using data-external might be an alternativ, but is that option available in all readers or just raw html?

sfadschm commented 3 years ago

I just noticed another flavor to the topic (since we are talking about it 😆 ):

These is two ways to insert linked images in MS Word.

  1. Using the IncludePicture field code.
  2. Using the menu via Insert > Image > {small down array} > Link Image

The first one is the more custumizable and preferred type and should be used for the docx writer (esapcially, as it intrinsically supports relative image file paths). In the reader, however, I think both cases should be covered while we are at it.

I tried different combinations and found, that the xml generated by MS Word differs. Here is the results.

Load a .png image via IncludePicture

// In 'word/document.xml'
<w:pict w14:anchorId="3811E5E3">
  <v:shape id="_x0000_i1027" type="#_x0000_t75" style="width:3in;height:165.8pt">

    <v:imagedata** r:id="rId10"/>

  </v:shape>
</w:pict>

// In 'word/_rels/document.xml.rels'
<Relationship Id="rId10" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="images/Example.png" TargetMode="External"/>

Load a .emf image via IncludePicture

// In 'word/document.xml'
<w:pict w14:anchorId="09715146">
  <v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
    <v:stroke joinstyle="miter"/>
    <v:formulas>
      <v:f eqn="if lineDrawn pixelLineWidth 0"/>
      <v:f eqn="sum @0 1 0"/><v:f eqn="sum 0 0 @1"/>
      <v:f eqn="prod @2 1 2"/>
      <v:f eqn="prod @3 21600 pixelWidth"/>
      <v:f eqn="prod @3 21600 pixelHeight"/>
      <v:f eqn="sum @0 0 1"/>
      <v:f eqn="prod @6 1 2"/>
      <v:f eqn="prod @7 21600 pixelWidth"/>
      <v:f eqn="sum @8 21600 0"/>
      <v:f eqn="prod @7 21600 pixelHeight"/>
      <v:f eqn="sum @10 21600 0"/>
    </v:formulas>
    <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
    <o:lock v:ext="edit" aspectratio="t"/>
  </v:shapetype>
  <v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:453.25pt;height:170.2pt">

    <v:imagedata r:id="rId7"/>

  </v:shape>
</w:pict>

// In 'word/_rels/document.xml.rels'
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="images/Example.emf" TargetMode="External"/>

Load a .emf image via Insert > Image

// In 'word/document.xml'
<w:drawing>
  <wp:inline distT="0" distB="0" distL="0" distR="0" wp14:anchorId="0163938D" wp14:editId="3EA59907">
    <wp:extent cx="5760085" cy="2168525"/>
    <wp:effectExtent l="0" t="0" r="0" b="0"/>
    <wp:docPr id="1" name="Example.emf"/>
    <wp:cNvGraphicFramePr>
      <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/>
    </wp:cNvGraphicFramePr><a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
      <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
        <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
          <pic:nvPicPr>
            <pic:cNvPr id="1" name="Example.emf"/>
            <pic:cNvPicPr/>
          </pic:nvPicPr>
          <pic:blipFill>

            <a:blip r:link="rId5"/>

            <a:stretch>
              <a:fillRect/>
            </a:stretch>
          </pic:blipFill>
          <pic:spPr>
            <a:xfrm>
              <a:off x="0" y="0"/>
              <a:ext cx="5760085" cy="2168525"/>
            </a:xfrm>
            <a:prstGeom prst="rect">
              <a:avLst/>
            </a:prstGeom>
          </pic:spPr>
        </pic:pic>
      </a:graphicData>
    </a:graphic>
  </wp:inline>
</w:drawing>

// In 'word/_rels/document.xml.rels'
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="file:///full_path_to_images_folder\images\Example.emf" TargetMode="External"/>

So the first way produces a w:pict while the second ways generates w:drawing.

jgm commented 3 years ago

A data-external attribute can be added in formats that support attributes. (So, Markdown as well as HTML and some others.)

jkr commented 3 years ago

Sure -- IncludePicture should be doable from the parser side. I'll need a day to familiarize myself with this. (Anything that's actually parsing drawing info, as in one of the emf examples above, seems like a bridge too far.) But the xml in the other examples seems workable enough.

sfadschm commented 3 years ago

Sounds fine to me. I honestly don't know why Word does this different behavior, it is however described in the Office XML docs so it seems to be intended. The w:pict cases cover the field code results, so they should be enough for this request.