Closed miraks31 closed 3 years ago
@jkr what do you think, is this feasible?
Hi @jkr,
A correction for this bug would be very appreciated. How can I help?
Sorry, I think this is out of scope for us. The image IS there, but it's actually not referred to by anything else in the docx, as far as I can see.
Hi jgm,
In the file word\document.xml, you can find the object with the link on the image:
<v:imagedata r:id="rId5" o:title=""/>
In the document word_rel\document.xml.rels, you can find the corresponding file.
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.emf"/>
At the end the image is in:
media/image1.emf
So, the link between the object and the associated image is existing.
In word documents, most of the time picture has been added by copy/paste from other applications (eg. Visio) and are not stored as a picture but as an object with a picture associated.
This could be great if pandoc is able to extract those picture too.
I think that all information are there to be done.
Thank you again for the job you did.
Hm, not sure how I missed that! Here's the XML:
<w:object w:dxaOrig="9735" w:dyaOrig="5850">
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype>
<v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:488.1pt;height:293pt" o:ole="">
<v:imagedata r:id="rId5" o:title=""/>
</v:shape>
<o:OLEObject Type="Embed" ProgID="Visio.Drawing.11" ShapeID="_x0000_i1025" DrawAspect="Content" ObjectID="_1591516258" r:id="rId6"/>
</w:object>
I don't actually understand what this does. Is the image with id rId5 just a bitmap version of the whole drawn object, or is it part of the object? If the former, I guess we can look in w:object for v:shape and get the imagedata.
is it resolved. I am facing the same problem
When an issue is open, it means that it has not yet been resolved.
This issue has been resolved by the commit above.
Hi,
I use pandoc 2.1.1 on windows and linux. When I try to convert this docx file, the image is not extracted. issue_object_as_image.docx
I think this is due to the fact that this is not a simple image, this is an object displayed as an image. But, because the image is well in the media directory into the docx (I checked it by changing the extension to .zip and extracting all files), I hope this should be able to extract this kind of image.
To reproduce this issue:
pandoc.exe -s --from docx-simple_tables-multiline_tables-grid_tables+pipe_tables --to commonmark+pipe_tables issue_object_as_image.docx -o issue_object_as_image.md --extract-media media --file-scope --wrap=none --atx-headers
Result The image is not extracted and the reference to the image is missing in markdown file
Expected result The image is extracted and the reference to the image is in markdown file
Thank you for this great tool. Regards.