Open gshmu opened 7 years ago
Pandoc's internal document model has width and height attributes for images, but no conception of cropping. So, not much we can do to respect the cropping specified in the docx. @jkr do you agree?
The AST has generic Attributes (see https://hackage.haskell.org/package/pandoc-types-1.17.0.5/docs/Text-Pandoc-Definition.html#t:Inline ). One could generate key-value-pairs while parsing those attributes and then use that if the target allows that.
in md it could look like this (beware: ugly, but docx is ugly ;p ):
![description](/path/to/img){crop-width=2.68 crop-height=1.18 crop-left=4.78 crop-top=10.29}
It would get ignored in most output-formats, but a writer (or filter) could match on those attributes.
But this would of course only work if the whole image is embedded in the docx and not just the cropped part.
@mhohai: what does it output now? Just the cropped area of the image? Could you post an example file and describe the output you get and the output you want?
I mean, we could do anything with attributes if we wanted to, but each special case would probably have to be written back into the writer code too (since we want markdown to be exportable). My quick take is that this would only be worthwhile if we thought it was worth aiming at numerous high-use output formats. (I believe with html, we'd need CSS for cropping, right? trim
or clip
option with \includegraphics
in LaTeX?) Anyway, pending the specifics from @mhohai, I guess I'd vote for: doesn't seem worth it now. But file it away and if we get other requests maybe revisit?
@jkr output with full image now. give the cropped image or the cropped attribute will be better
When the docx reader extracts the image from the docx file, it could also just crop it right there... (not sure whether we have an image manipulation lib already as a dependency though)
We have JuicyPixels; I don't see cropping functions there, but you could no doubt write one, since it has functions to convert images into vectors of pixels, and functions to generate images from vectors of pixels. So, although it seems a bit crazy at first, I think pandoc could actually do this, and it might be preferable to the other options, as we wouldn't have to support the cropping attributes in writers.
+++ Mauro Bieg [May 20 17 09:17 ]:
When the docx reader extracts the image from the docx file, it could also just crop it right there... (not sure whether we have an image manipulation lib already as a dependency though)
— You are receiving this because you commented. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.
References
JuicyPixels-extra
has a crop function...
I agree that it would be good if Pandoc cropped images on output. At the moment I'm working with Word documents of illustrated lectures with scanned slides, all of which have been cropped in the document!
I had a look at the code, this is all very unfamiliar territory for me (been about 20 years since I did any Haskell!), but would the idea be to crop the image when constructing the Drawing
in elemToParPart'
in Parse.hs
?
At least in the file I'm working with it seems to be quite complicated, as the crop is expressed thus:
<pic:blipFill>
<a:blip r:embed="rId3"></a:blip>
<a:srcRect l="9471" t="6145" r="16090" b="4797" />
<a:stretch>
<a:fillRect />
</a:stretch>
</pic:blipFill>
I guess that it would be possible to assume the fill, and only parse the a:srcRect
element to get the crop, as a start. And then what, crop the element there and then in elemToParPart'
, producing a new binary string to put in the Drawing
? (Again, inefficient in general, but it might be reasonable to bet in a first cut that any given crop will likely only be used once.)
So, the steps would seem to be:
Image
.srcRect
l
, r
, t
and b
attributes.r - l
and t - b
to get the width and height.Codec.Picture.Extra.crop
.Image
back into a binary string.And then continue constructing the Drawing
.
Phew, quite a lot of impedance matching. Would all the steps above go in a helper function in Parse.hs
?
One possibility would be to add parameters to Drawing
for the crop attributes and whatever else you need. Then the actual cropping could be done in T.P.Readers.Docx, parPartToInlines', right before the image is inserted into the MediaBag using insertMedia
.
A helper function for cropping could be put in T.P.Image.
I think that at the moment since this would essentially be a "relearn Haskell" exercise for me, i.e. a considerable amount of time getting to grips with a language I've forgotten, plus a bunch of code and APIs I've never seen before, and since the images in the documents I'm working on need to be "remastered" anyway (they too are decades old!), I will try asking for the new images to be pre-cropped and avoid needing this functionality in the first place. Thanks for your feedback anyway; perhaps the above will prove useful for someone who either Haskells more fluently than I or has more cropped images in documents they really want to process.
docx to markdown full size image of cropped