jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.56k stars 3.38k forks source link

Option to Link Images Rather Than Embed Them For ODT #9815

Closed iandol closed 4 months ago

iandol commented 5 months ago

Describe your proposed improvement and the problem it solves.

For many formatting workflows, editors or publishers prefer not to embed figures. ODT allows you to easily embed or link images, and in fact the opendocument writer already supports linking:

pandoc -t opendocument

![](placeholder.png)

<text:p text:style-name="Text_20_body">
<draw:frame draw:name="img1">
<draw:image xlink:href="placeholder.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" />
</draw:frame>
</text:p>

BUT odt forces the image to be embedded, so the same markdown becomes something like:

<text:p text:style-name="Standard">
 <draw:frame draw:style-name="fr3" draw:name="Image1" text:anchor-type="as-char" svg:width="4.704cm" svg:height="1.552cm" draw:z-index="0">
  <draw:image draw:mime-type="image/png">
    <office:binary-data>iVBORw0KGgoAAAANSUhEUgAAAMgAAABCCAMAAAAlrWkSAAABUFBMVEXL//jK/vfK/fbJ/fbJ
       /PXI/PXI+/TH+/TH+vPG+fLG+PHF+PHF9/HF9/DE9/DE9u/D9u/D9e/D9e7D9O7C9O3C8+3C
       8+zB8uvA8uvA8evA8eq/8Oq/8Om/7+m+7+m+7+i+7ui97ue97ee97ea87Oa87OW76+W76+S7
       6uS66uS66uO66eO56eK56OK46OG45+G35uC35t+25d+25N61492049204ty04duz4duz4dqz
       4Nqy4Nqx3tiw3dew3dav3Nau29Wu2tSt2dOs2NKr19Gr1tCq1c+p1c+p1M+o082n0s2n0syn
       0cym0cum0Mul0Mqlz8qlz8mkzsmkzsikzcijzcijzcejzMeizMaiy8WhysWhysSgycSgycOf
       yMOfyMKfx8Kex8GexsGdxsCdxcCdxb+cxL+cxL6cw76bw76bwr2awr2awryawbyZwbuZwLud
       UsQ6AAACBElEQVR42u3VA7J0OxgF0N38r23btm3bNs+ef/ElaVSzjPPdlzWJhf8ny7Isy7Ks
       0nn8DYdOCf6CHvIA8sH/RLIb8s1SefRDuvxfajOQbpvGTz5ka2XYFkTz3jGiBZJNMerWC7my
       v6i8rFCbgFzr1AaCr1S+siFVA7UzYJDaGgTKbJ7cuKNRCeCc2sfBfG+JB1LkdcxsPzNqGUoV
       o35OlwbLfXC34t75/XfG+ciAtsI4zuXKSG0AbuVZZ5IRGJmfTPRaBNfybDLe/YoHIb1nP4zz
       VgYX8+4yzLlaGa0PIoanpG/+4JNhnxVwNd8hjRUfUmtyqH3VwOUCJzSGkFLOI7WfBrhe8IJG
       P1LIuqP22wIBMq5odCNJxjU1px0iZN5Qc9qQ4N8ljR4IkX1P7bcJcYLnNAYgRt4jte8AYi3T
       GIYgBS9UfhFnkdoYRBmmcos4Y1QuIcsMlT3E6aLyAllWqCwjTiU1H0Q5pDINzT+8UAQtk1oR
       RHmI5J4z905yuwnKD5UmSOJxqNShet1hyPWQD7dUBiFJDrWJc8Z4m72gMgNJapjWKiTpZVqH
       kGSSjNpuzJ59Y9QjJFli2NdiIRTf0DXDHA8E2aXxMBZERNMOQ3IhyA2Vo04PYhUtfVGphSDf
       /F0tR5J/449kH+TIeJ3JREqe7uNpyOHzIb0gLMuyLMuyLFn+A7VPEV2OysTaAAAAAElFTkSu
       QmCC
    </office:binary-data>
   </draw:image>
  </draw:frame>
</text:p>

It would be great if there was a command-line option to allow to link to images (i.e. preserve the opendocument way for odt). This way we could generate ODT files with figures that were linked. The same technically applies to DOCX (Word does allow linking, but of course the syntax is much more complex).

Describe alternatives you've considered.

I imagine a Lua filter could do this, and I suspect it is a viable workaround?

iandol commented 5 months ago

The code that generates the opendocument figure output is here:

https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Writers/OpenDocument.hs#L647

BUT I can't work out how the ODT writer changes this?

iandol commented 4 months ago

There is already a --embed-resources-true|false and I wonder if this could be triggered like:

> pandoc -t odf --embed-resources=false -o out.odt in.md

Which would trigger the images-as-links. This issue is for ODT as I think it should be a simple change, but DOCX also supports images-as-links (with more complex OpenXML changes needed)...

jgm commented 4 months ago

I think it might be confusing if --embed-resources had a default of true for odt but a default of false otherwise...

ptram commented 4 months ago

I think it might be confusing if --embed-resources had a default of true for odt but a default of false otherwise...

Do I understand correctly, from the message from @iandol above, that OpenDocument is actually written with embedding set to false, and ODT and DOCX with it set to true?

Paolo

iandol commented 4 months ago

I think it might be confusing if --embed-resources had a default of true for odt but a default of false otherwise...

Right, if used this would need to be clearly documented. The alternative is a new command-line option which will probably sound similar (--embed-images or maybe --link-images) and I imagine increases maintenance a bit more.

Do I understand correctly, from the message from @iandol above, that OpenDocument is actually written with embedding set to false, and ODT and DOCX with it set to true?

I don't think this is explicitly controlled. The opendocument writer uses links (technically embedding false, but not some sort of global switch), and this is somehow changed for ODT and DOCX always embeds. Having looked at the writers at least for me who knows no Haskell, I couldn't see an easy implementation.

ptram commented 4 months ago

I will add a reason for implementing this feature (linked/embedded images): while embedding may produce easier-to-handle ODT or DOCX files, it would also prevent them from being used as an intermediate file format for going from Markdown to a page layout program.

Programs like InDesign, Affinity Publisher, QuarkXPress, Scribus, can all import the RTF or DOCX file format. They are unfortunately unable to import Markdown. Apparently, there is no way to make Markdown compatibility a priority. Hence the importance of Pandoc in the process. If image links and names were preserved in the translation, the original aim of the Markdown project would also be preserved.

A page layout program as a last step before generating a PDF or ePub file is very useful, since many details can be finely adjusted in a way that isn't when programming the output thinking to LaTeX or Typst. Pandoc would allow a smooth integration between the ease of authoring offered by Markdown, and the fine control on details and high-quality typographic output offered by page layout programs.

ptram commented 4 months ago

Incidentally: I'm completing a PDF project started in Markdown, and completed in Word for the impossibility to finish it in a page layout program, due to the lack of a reasonable way to go from Markdown to a page layout program. I hate Word, I hate the world! Nobody dare talking to me today!

jgm commented 4 months ago

It would be easy enough to modify the ODT writer to optionally skip the step that embeds the images. (transformPicMath function...though presumably we still want the math part of that.)

The difficulty is figuring out what should trigger this. As I mentioned, it would be weird to make --embed-resources=false trigger it, because false is the default. In addition, --embed-resources is just for HTML.

One could add another option I suppose.

iandol commented 4 months ago

How about --link-images=true|false? I thought about --embed-images but it sounds confusingly similar to --embed-resources...

jgm commented 4 months ago

I think --link-images makes sense. I suppose that at first we could implement this for ODT only -- maybe it's also possible for docx.

iandol commented 4 months ago

Right, ODT appears straightforward as usual, and LibreOffice can convert to DOCX for anyone who needs DOCX.

While i think DOCX is low priority, out of curiosity I generated a minimal DOCX with a linked image to demonstrate the desired output. In word/document.xml the inline linked image is encoded by this baroque XML:

<w:r w:rsidR="00E10F1E">
    <w:rPr>
        <w:noProof/>
    </w:rPr>
    <w:drawing>
        <wp:inline distT="0" distB="0" distL="0" distR="0">
            <wp:extent cx="1270000" cy="419100"/>
            <wp:effectExtent l="0" t="0" r="0" b="0"/>
            <wp:docPr id="744031760" name="placeholder.png"/>
            <wp:cNvGraphicFramePr>
                <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/>
            </wp:cNvGraphicFramePr>
            <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
                <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                    <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                        <pic:nvPicPr>
                            <pic:cNvPr id="744031760" name="placeholder.png"/>
                            <pic:cNvPicPr/>
                        </pic:nvPicPr>
                        <pic:blipFill>
                            <a:blip r:link="rId4"/>
                            <a:stretch>
                                <a:fillRect/>
                            </a:stretch>
                        </pic:blipFill>
                        <pic:spPr>
                            <a:xfrm>
                                <a:off x="0" y="0"/>
                                <a:ext cx="1270000" cy="419100"/>
                            </a:xfrm>
                            <a:prstGeom prst="rect">
                                <a:avLst/>
                            </a:prstGeom>
                        </pic:spPr>
                    </pic:pic>
                </a:graphicData>
            </a:graphic>
        </wp:inline>
    </w:drawing>
</w:r>

The link to disk is stored in word/_rels/document.xml.rels as Id="rId4":

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
    <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
    <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
    <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="file:////Users/ian/placeholder.png" TargetMode="External"/>
</Relationships>
iandol commented 4 months ago

Sorry forgot to add the docx:

link.docx

placeholder.png: placeholder.png

Note Word uses an absolute path whereas LibreOffice uses a relative path, I will test if a relative path will work if I manually edit the XML...


EDIT: using Target="file:///placeholder.png" seemed to work (I got a warning on opening, probably as I manually edited a file) and saving to a new word document kept the relative path.

jgm commented 4 months ago

Here's a patch that would add --link-images.

diff --git a/src/Text/Pandoc/App/CommandLineOptions.hs b/src/Text/Pandoc/App/CommandLineOptions.hs
index c3abe1ba1..c50ec6208 100644
--- a/src/Text/Pandoc/App/CommandLineOptions.hs
+++ b/src/Text/Pandoc/App/CommandLineOptions.hs
@@ -601,6 +601,14 @@ options =
                   "true|false")
                  "" -- "Make slide shows include all the needed js and css"

+    , Option "" ["link-images"] -- maybe True (\argStr -> argStr == "true") arg
+                 (OptArg
+                  (\arg opt -> do
+                        boolValue <- readBoolFromOptArg "--link-images" arg
+                        return opt { optLinkImages =  boolValue })
+                  "true|false")
+                 "" -- "Link images in ODT rather than embedding them"
+
     , Option "" ["request-header"]
                  (ReqArg
                   (\arg opt -> do
diff --git a/src/Text/Pandoc/App/Opt.hs b/src/Text/Pandoc/App/Opt.hs
index c1f16279c..b6050f117 100644
--- a/src/Text/Pandoc/App/Opt.hs
+++ b/src/Text/Pandoc/App/Opt.hs
@@ -119,6 +119,7 @@ data Opt = Opt
     , optIncremental           :: Bool    -- ^ Use incremental lists in Slidy/Slideous/S5
     , optSelfContained         :: Bool    -- ^ Make HTML accessible offline (deprecated)
     , optEmbedResources        :: Bool    -- ^ Make HTML accessible offline
+    , optLinkImages            :: Bool    -- ^ Link ODT images rather than embedding
     , optHtmlQTags             :: Bool    -- ^ Use <q> tags in HTML
     , optHighlightStyle        :: Maybe Text -- ^ Style to use for highlighted code
     , optSyntaxDefinitions     :: [FilePath]  -- ^ xml syntax defs to load
@@ -201,6 +202,7 @@ instance FromJSON Opt where
        <*> o .:? "incremental" .!= optIncremental defaultOpts
        <*> o .:? "self-contained" .!= optSelfContained defaultOpts
        <*> o .:? "embed-resources" .!= optEmbedResources defaultOpts
+       <*> o .:? "link-images" .!= optLinkImages defaultOpts
        <*> o .:? "html-q-tags" .!= optHtmlQTags defaultOpts
        <*> o .:? "highlight-style"
        <*> o .:? "syntax-definitions" .!= optSyntaxDefinitions defaultOpts
@@ -526,6 +528,8 @@ doOpt (k,v) = do
       parseJSON v >>= \x -> return (\o -> o{ optSelfContained = x })
     "embed-resources" ->
       parseJSON v >>= \x -> return (\o -> o{ optEmbedResources = x })
+    "link-images" ->
+      parseJSON v >>= \x -> return (\o -> o{ optLinkImages = x })
     "html-q-tags" ->
       parseJSON v >>= \x -> return (\o -> o{ optHtmlQTags = x })
     "highlight-style" ->
@@ -738,6 +742,7 @@ defaultOpts = Opt
     , optIncremental           = False
     , optSelfContained         = False
     , optEmbedResources        = False
+    , optLinkImages            = False
     , optHtmlQTags             = False
     , optHighlightStyle        = Just "pygments"
     , optSyntaxDefinitions     = []
diff --git a/src/Text/Pandoc/App/OutputSettings.hs b/src/Text/Pandoc/App/OutputSettings.hs
index d08cb626b..11d813e5e 100644
--- a/src/Text/Pandoc/App/OutputSettings.hs
+++ b/src/Text/Pandoc/App/OutputSettings.hs
@@ -262,6 +262,7 @@ optToOutputSettings scriptingEngine opts = do
         , writerReferenceDoc     = optReferenceDoc opts
         , writerSyntaxMap        = syntaxMap
         , writerPreferAscii      = optAscii opts
+        , writerLinkImages       = optLinkImages opts
         }
   return $ OutputSettings
     { outputFormat = format
diff --git a/src/Text/Pandoc/Options.hs b/src/Text/Pandoc/Options.hs
index 20aec2624..e4ff56b77 100644
--- a/src/Text/Pandoc/Options.hs
+++ b/src/Text/Pandoc/Options.hs
@@ -325,6 +325,7 @@ data WriterOptions = WriterOptions
   , writerReferenceLocation :: ReferenceLocation    -- ^ Location of footnotes and references for writing markdown
   , writerSyntaxMap         :: SyntaxMap
   , writerPreferAscii       :: Bool           -- ^ Prefer ASCII representations of characters when possible
+  , writerLinkImages        :: Bool           -- ^ Use links rather than embedding ODT images
   } deriving (Show, Data, Typeable, Generic)

 instance Default WriterOptions where
@@ -363,6 +364,7 @@ instance Default WriterOptions where
                       , writerReferenceLocation = EndOfDocument
                       , writerSyntaxMap        = defaultSyntaxMap
                       , writerPreferAscii      = False
+                      , writerLinkImages       = False
                       }

 instance HasSyntaxExtensions WriterOptions where
diff --git a/src/Text/Pandoc/Writers/ODT.hs b/src/Text/Pandoc/Writers/ODT.hs
index 8464a01e0..8eec979d9 100644
--- a/src/Text/Pandoc/Writers/ODT.hs
+++ b/src/Text/Pandoc/Writers/ODT.hs
@@ -272,15 +272,19 @@ transformPicMath opts (Image attr@(id', cls, _) lab (src,t)) = catchError
                               Just dim         -> Just $ Inch $ inInch opts dim
                               Nothing          -> Nothing
        let  newattr = (id', cls, dims)
-       entries <- gets stEntries
-       let extension = maybe (takeExtension $ takeWhile (/='?') $ T.unpack src) T.unpack
-                           (mbMimeType >>= extensionFromMimeType)
-       let newsrc = "Pictures/" ++ show (length entries) <.> extension
-       let toLazy = B.fromChunks . (:[])
-       epochtime <- floor `fmap` lift P.getPOSIXTime
-       let entry = toEntry newsrc epochtime $ toLazy img
-       modify $ \st -> st{ stEntries = entry : entries }
-       return $ Image newattr lab (T.pack newsrc, t))
+       src' <- if writerLinkImages opts
+                  then return src
+                  else do
+                    entries <- gets stEntries
+                    let extension = maybe (takeExtension $ takeWhile (/='?') $ T.unpack src) T.unpack
+                                        (mbMimeType >>= extensionFromMimeType)
+                    let newsrc = "Pictures/" ++ show (length entries) <.> extension
+                    let toLazy = B.fromChunks . (:[])
+                    epochtime <- floor `fmap` lift P.getPOSIXTime
+                    let entry = toEntry newsrc epochtime $ toLazy img
+                    modify $ \st -> st{ stEntries = entry : entries }
+                    return $ T.pack newsrc
+       return $ Image newattr lab (src', t))
    (\e -> do
        report $ CouldNotFetchResource src $ T.pack (show e)
        return $ Emph lab)

However, it doesn't work (at least, LibreOffice raises an error and does not display the image). Had you actually tested ODTs with the linked images?

iandol commented 4 months ago

Yes, ODT definitely supports links. Here is a linked doc (same placeholder.png as above):

link.odt

link.fodt.zip

Saved as an ODT and flat FODT. Flanked by "Pre." and "Post." paragraphs:

image

The GUI shows an absolute path but it is saved slightly differently between the ODT (../placeholder.png) and FODT (placeholder.png):

ODT:

<text:p text:style-name="Standard">
<draw:frame draw:style-name="fr1" draw:name="Image1" text:anchor-type="as-char" svg:width="4.759cm" style:rel-width="28%" svg:height="1.549cm" style:rel-height="scale" draw:z-index="0">
<draw:image xlink:href="../placeholder.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:filter-name="&lt;All images&gt;" draw:mime-type="image/png"/>
</draw:frame>
</text:p>

FODT:

<text:p text:style-name="Standard">
<draw:frame draw:style-name="fr1" draw:name="Image1" text:anchor-type="as-char" svg:width="4.759cm" style:rel-width="28%" svg:height="1.549cm" style:rel-height="scale" draw:z-index="0">
<draw:image xlink:href="placeholder.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:filter-name="&lt;All images&gt;" draw:mime-type="image/png"/>
</draw:frame>
</text:p>

I wonder if there is something else in the document that is required. OK, here I take a Pandoc generated ODT:

pandoc.odt

<office:body>
  <office:text>
    <text:p text:style-name="Text_20_body">Pre.</text:p>
    <text:p text:style-name="Text_20_body">Post.</text:p>
  </office:text>
</office:body>

And open it, and add a linked image pandoc+link.odt:

<office:body>
  <office:text>
    <text:sequence-decls>
      <text:sequence-decl text:display-outline-level="1" text:separation-character="." text:name="Illustration"/>
      <text:sequence-decl text:display-outline-level="0" text:name="Table"/>
      <text:sequence-decl text:display-outline-level="0" text:name="Text"/>
      <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
      <text:sequence-decl text:display-outline-level="0" text:name="Figure"/>
    </text:sequence-decls>
    <text:p text:style-name="P2">Pre.</text:p>
    <text:p text:style-name="P2">
      <draw:frame draw:style-name="fr1" draw:name="Image1" text:anchor-type="as-char" svg:width="16.51cm" style:rel-width="100%" svg:height="5.447cm" style:rel-height="scale" draw:z-index="0">
        <draw:image xlink:href="../placeholder.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:filter-name="&lt;All images&gt;" draw:mime-type="image/png"/>
        </draw:frame>
    </text:p>
    <text:p text:style-name="Text_20_body">Post.</text:p>
  </office:text>
</office:body>

<text:sequence-decls> gets added into the office:body. I will try several ablation experiments to see what causes ODT o fail to load.


note: importing a linked image in LO wraps it into a caption box and floats it; I am manually removing the caption box and unfloating the image (making it inline) to try to simplify the testcase. I will need to test a captioned image later on...

iandol commented 4 months ago

Here's another comparison. I generated an ODT with image with Pandoc:

pandoc.odt.zip

<office:body>
<office:text>
<text:p text:style-name="Text_20_body">Pre.</text:p>
<text:p text:style-name="Text_20_body">
<draw:frame draw:name="img1" svg:width="200.0pt" svg:height="66.0pt">
<draw:image xlink:href="Pictures/0.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" />
</draw:frame>
</text:p>
<text:p text:style-name="Text_20_body">Post.</text:p>
</office:text>
</office:body>

Duplicated it and then converted the image to a link (see screenshot above you can add a link filename which turns and embedd into a linked image):

pandoc+link.odt.zip

<office:body>
<office:text>
<text:sequence-decls><text:sequence-decl text:display-outline-level="0" text:name="Illustration"/><text:sequence-decl text:display-outline-level="0" text:name="Table"/><text:sequence-decl text:display-outline-level="0" text:name="Text"/><text:sequence-decl text:display-outline-level="0" text:name="Drawing"/><text:sequence-decl text:display-outline-level="0" text:name="Figure"/></text:sequence-decls>
<text:p text:style-name="Text_20_body">Pre.</text:p>
<text:p text:style-name="Text_20_body">
<draw:frame draw:style-name="fr1" draw:name="img1" text:anchor-type="as-char" svg:width="7.056cm" svg:height="2.328cm" draw:z-index="0">
<draw:image xlink:href="../placeholder.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:filter-name="&lt;All images&gt;" draw:mime-type="image/png"/></draw:frame>
</text:p>
<text:p text:style-name="Text_20_body">Post.</text:p>
</office:text>
</office:body>

In the Pandoc untouched ODT is a META-INF/metadata.xml that does point to the Pictures/o.png image insode the ODT:

<?xml version="1.0" encoding="utf-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0" manifest:version="1.3">
  <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/" manifest:version="1.3" />
  <manifest:file-entry manifest:media-type="application/xml" manifest:full-path="content.xml" />
  <manifest:file-entry manifest:media-type="image/png" manifest:full-path="Pictures/0.png" />
  <manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="manifest.rdf" />
  <manifest:file-entry manifest:media-type="application/xml" manifest:full-path="styles.xml" />
  <manifest:file-entry manifest:media-type="application/xml" manifest:full-path="meta.xml" />
</manifest:manifest>

If you can upload a non-working ODT I can have a better look.

iandol commented 4 months ago

Here is an even more minimal file. I generated an ODT with Pandoc, duplicated it and edited it as follows:

  1. Delete Pictures folder.
  2. Remove <manifest:file-entry manifest:media-type="image/png" manifest:full-path="Pictures/0.png" /> from manifest.xml
  3. Edit the href in content.xml to xlink:href="../placeholder.png"

This produces a working ODT with a linked image (placeholder.png from above in the same folder):

pandoc+tweak.odt

Now, if I remove ../ from the href, then ODT complains:

image

So it seems the link must be ../ to point to the parent folder, in this case I assume LO treats the zip root as ./ which then makes sense.

jgm commented 4 months ago

OK, that was what I was missing: we have to put ../ in front of the relative path in the ODT.

ptram commented 4 months ago

I tried Pandoc 3.2.1 on my Intel Mac, and indeed the image path and name is included in the DOCX file. I did my conversion though Quarto 1.6.1.

However, Word for Mac continues to embed an image with its own RTF name. When imported into InDesign or Affinity Publisher, only the embedded image is considered.

I don't know if this is still something that can be solved on the Pandoc side, or it is something inside Word or the page layout programs that are importing it.

jgm commented 4 months ago

@ptram this patch only affects ODT, not DOCX.

iandol commented 4 months ago

I tried Pandoc 3.2.1 on my Intel Mac, and indeed the image path and name is included in the DOCX file. I did my conversion though Quarto 1.6.1.

Also this is only testable in a nightly build: https://github.com/jgm/pandoc/actions/workflows/nightly.yml for example: https://github.com/jgm/pandoc/actions/runs/9790558535/artifacts/1667189311 -- it hasn't made it to a release yet.

This is a simple test with an image called placeholder.png in the same folder: pandoc > ODT

out.odt placeholder

Then saved with LO as DOCX:

out.docx

At least saved from LO the DOCX ZIP does not embed the image and shows the image is external.

<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="file:///Users/ian/placeholder.png" TargetMode="External"/>

Word treats it as an absolute path, but I assume it will adjust based on the loading location? But how that imports I can't test.

LibreOffice is better at a bunch of stuff and often makes a better intermediate than Word itself.

ptram commented 4 months ago

Apologies for doing the usual mess.

In Word for Mac, the above "out.docx" file translates this way:

image

How the file path translates I can't say, since I've yet to discover a way to show it in Word (I read this feature may have been removed in recent years, for privacy reasons). I'm not even able to see a file name and path inside the DOCX file, when examining it as raw text.

jgm commented 4 months ago

@ptram So, the issue is that you are using --link-images to create an ODT, and then converting the resulting ODT to docx, and getting the wrong link path. I think I understand why. When we add the image as a link to the ODT, we have to include a ../ prefix to the path (so the link points to ../placeholder.png, not ./placeholder.png. I assume the docx fails because it doesn't see the image at ../placeholder.png (it is resolving the path relative to the working directory). You could test this by moving placeholder.png up to ../ and see if the docx then works.