jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.85k stars 3.34k forks source link

Implement ODT support for implicit_figures #2401

Closed iandol closed 3 months ago

iandol commented 8 years ago

Hi, the documentation suggests for implicit_figures "This feature is not yet implemented for RTF, OpenDocument, or ODT". Multimarkdown does support figure captions for block markup figures, with this MMD:

![1—A medieval illustration of the ventricular theory of sensory perception, in which sense information (apart from touch in this case) is transferred into the *sensus communis* of the first of the 3 supposed ventricles for initial processing. This version is contained in the *Margarita Philosophica* {Reisch, 1504, #69055}](MargaritaPhilosophica.jpg)

generating the following FODT:

<text:p>
<draw:frame text:anchor-type="as-char" draw:z-index="0" draw:style-name="fr1" svg:width="95%">
<draw:text-box>
<text:p>
<draw:frame text:anchor-type="as-char" draw:z-index="1" >
<draw:image xlink:href="MargaritaPhilosophica.jpg" xlink:type="simple" xlink:show="embed" link:actuate="onLoad" draw:filter-name="&lt;All formats&gt;"/>
</draw:frame>
</text:p>
<text:p>Figure <text:sequence text:name="Figure" text:formula="ooow:Figure+1" style:num-format="1"> Update Fields to calculate numbers</text:sequence>: 1—A medieval illustration of the ventricular theory of sensory perception, in which sense information (apart from touch in this case) is transferred into the <text:span text:style-name="MMD-Italic">sensus communis</text:span> of the first of the 3 supposed ventricles for initial processing. This version is contained in the <text:span text:style-name="MMD-Italic">Margarita Philosophica</text:span> {Reisch, 1504, #69055}
</text:p>
</draw:text-box>
</draw:frame>
</text:p>

Will implicit_figures for ODT be supported in pandoc at some point in the future, and is there a timeline more or less? Thanks for an excellent tool!

jgm commented 8 years ago

No timeline. If someone wants to write the code (and adjust tests accordingly) and submit a PR, I'd certainly consider merging it.

iandol commented 8 years ago

Thanks for the info!

jgm commented 8 years ago

If you want to reopen this, the code sample you pasted in might be helpful if someone wants to add this feature.

+++ Ian [Sep 17 15 00:10 ]:

Thanks for the info!

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/jgm/pandoc/issues/2401#issuecomment-140988960
iandol commented 8 years ago

OK, wish I could help but I really wouldn't know where to start, plus Haskell looks like Klingon to my poor Biologist-procedural-programmer eyes. I had a look at the Text.Pandoc.Writers.ODT.hs and see there is a blockToOpenDocument function but how to wrangle that to do the conversion...

Thanks John!

jgm commented 8 years ago

This will probably look more straightforward (and this is what needs changing):

https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/OpenDocument.hs

See line 347, definition of 'figure'.

+++ Ian [Sep 17 15 12:06 ]:

OK, wish I could help but I really wouldn't know where to start, plus Haskell looks like Klingon to my poor Biologist-procedural-programmer eyes. I had a look at the Text.Pandoc.Writers.ODT.hs and see there is a blockToOpenDocument function but how to wrangle that to do the conversion...

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/jgm/pandoc/issues/2401#issuecomment-141191234
iandol commented 8 years ago

Also this is the code in MMD that handles images with optional captions:

https://github.com/fletcher/MultiMarkdown-4/blob/master/odf.c#L463

lierdakil commented 8 years ago

Uh... sorry, but what exactly is this about? #376, #2070?

lierdakil commented 8 years ago

I guess docs should've been updated, but other than that, I believe Pandoc supports implicit_figures with OpenDocument/ODT output.

iandol commented 8 years ago

@lierdakil — hm, going from MMD -> ODT with the example above and no image captions are generated. Perhaps this fails for the MMD input case, though the syntax is the same I think. If I use:

![This is the caption](Beast_mmd/eyes.png)  

I get no caption even as a subsequent paragraph:

<text:p text:style-name="First_20_paragraph"><draw:frame draw:name="img1" svg:width="397pt" svg:height="400pt"><draw:image xlink:href="Pictures/0.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" /></draw:frame></text:p>

using the following commandline: pandoc test.md --from markdown_mmd -o testPD.odt — pandoc is pandoc 1.15.0.6 on OS X 10.11.1

iandol commented 8 years ago

Right, if I omit the MMD --from then I get the caption, so for some reason this is being ignored for MMD, though MMD itself supports it...

lierdakil commented 8 years ago

Try explicitly enabling: '-f markdown_mmd+implicit_figures' 1 окт. 2015 г. 2:20 пользователь "Ian" notifications@github.com написал:

Right, if I omit the MMD --from then I get the caption, so for some reason this is being ignored for MMD, though MMD itself supports it...

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/2401#issuecomment-144572159.

iandol commented 8 years ago

Yes, that works @lierdakil thank you. So the question is if there is a reason the extension is not enabled by default for MMD (as both support the same syntax for the same feature), if there is then this can be closed.

MMD also wraps the figure caption in a frame which is slightly cleaner structurally, and appends figure number sequence (auto numbering which a reference-able). The XML seems pretty straight-forward for this. No great issue and I could probably even hack this myself, but wonder if there are reasons against using a frame (1) and adding numbering (2). The argument against (2) is that it wouldn't apply across output formats (does HTML even support auto-numbering etc.). But wrapping the caption in a frame is what LibreOffice does by default I think.
lierdakil commented 8 years ago

Automatic numbering is not something we can replicate in other output formats, at least not at the moment. There is pandoc-crossref, but it will insert figure numbers as plaintext in odt output.

As for frame, I briefly considered implementing it, but XML happened to be much less straightforward then I felt was worth it, esp. considering different rendering implementations between oo, lo and msword. 1 окт. 2015 г. 10:59 пользователь "Ian" notifications@github.com написал:

Yes, that works @lierdakil https://github.com/lierdakil thank you. So the question is if there is a reason the extension is not enabled by default for MMD (as both support the same syntax for the same feature), if there is then this can be closed.

MMD also wraps the figure caption in a frame which is slightly cleaner structurally, and appends figure number sequence (auto numbering which a reference-able). The XML seems pretty straight-forward for this. No great issue and I could probably even hack this myself, but wonder if there are reasons against using a frame (1) and adding numbering (2). The argument against (2) is that it wouldn't apply across output formats (does HTML even support auto-numbering etc.). But wrapping the caption in a frame is what LibreOffice does by default I think.

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/2401#issuecomment-144649535.

lierdakil commented 8 years ago

So, @jgm, do you suppose we could add Ext_implicit_figures to multimarkdownExtensions? I don't think this is new, so I'm not exactly sure why isn't it included.

iandol commented 8 years ago

Thanks @lierdakil, using a frame is no big issue. And I understand the issue with auto-numbering, though pandoc does have other features only some formats support but not others. But that is for another issue.

I also notice subscript and superscript extensions have to be explicitly enabled, and again these are things MMD supports too...

jgm commented 8 years ago

+++ Ian [Oct 01 15 16:30 ]:

I also notice subscript and superscript extensions have to be explicitly enabled, and again these are things MMD supports too...

It may be that some features were added to MMD since I added the markdown_mmd option to pandoc. It will be an easy change to add these.

iandol commented 8 years ago

Here is the documentation FYI for subscript and superscript support in MMD:

http://fletcher.github.io/MultiMarkdown-4/MMD_Users_Guide.html#superscriptsandsubscripts

jgm commented 8 years ago

Subscripts and superscripts work differently in MMD. You can do e^2 and a~1, where in pandoc you need to do e^2^ or a~1~. Still, since the pandoc-style ones WILL work in MMD, enabling these options seems fine.

jgm commented 8 years ago

@iandol I think the numbering is a bit problematic, without some mechanism for localization -- we don't want to bake in the word "Figure" as the XML above does. But putting the whole thing in a frame seems worth doing and shouldn't be too complex. @lierdakil what difficulties did you encounter? I don't think we need to worry about other formats. It would be good to do this in Word too, but I see no reason not to do it in ODT even without doing it in Word.

lierdakil commented 8 years ago

@jgm, it was a while ago, so details are somewhat fuzzy. What I can remember right off the bat is that frame dimensions were messed up between OO and LO due to different rendering strategy, and only way I was able to make it work in both was setting frame dimensions in pixels. I'm no ODF expert though, so I might have missed an obvious solution.

P.S. And when I was talking about Word, I meant it's ODT renderer, not docx.

ghost commented 8 years ago

Even with images as they currently are (without the nested frames) I often have to do manual resizing in odt (Libreoffice), so I don't think that should be a show stopper. @lierdakil - perhaps whatever code you wrote before is worth trying again in the latest releases of OO.org and LO, if you still have it.

As for hardcoding "Figure", is there anything wrong with a writer-specific option?

hubertp-lshift commented 7 years ago

@jgm Fixed by https://github.com/jgm/pandoc/pull/3165?

lierdakil commented 7 years ago

@hubertp-lshift, I believe #3165 is for ODT reader. This issue about ODT writer output, so no, probably not.

jgm commented 7 years ago

This is a confusing thread. If I'm not mistaken, the only outstanding issue here is whether ODT figures can be put into a frame?

ghost commented 7 years ago

@jgm I think it all boils down to nobody having all three of time, skill and interest to do it. ODT figures definitely can be put in frames, as I've used perl to post-process pandoc output to that effect in the past.

iandol commented 7 years ago

@jgm: yes that is the only outstanding issue, which probably still applies also applies to DOCX as well as ODT unless something has changed (didn't see anything in the changelog).

jgm commented 5 years ago

Figure numbers have been dealt with now in commit ecd4d5b8d8cfda6a2cd8d8fb631e0d7c79bee363.

We should work on putting figures and captions in proper frames rather than paragraphs. @pyssling any interest?

pyssling commented 5 years ago

I'm looking at it. It should be doable.

I assume the point is to limit the width of the caption to the width of the figure which is useful if you want to place it on the side of a page with text wrapping around it or similar?

pyssling commented 5 years ago

@jgm I've had a good look now. This isn't strictly speaking something I need right now, maybe later. This would be rather invasive. Basically we need to split dimension setting into two parts in the case where there is an outer and inner frame (the outer contains caption and figure, the inner one contains only the figure.)

This is made complicated by the fact that we post-process dimensions in transformPicMath function in ODT.hs . This makes things awkward to say the least.

Do you know why it's done this way? Maybe because this is where we actually push the image into the file and can therefore get the real dimensions?

jgm commented 5 years ago

Nils Carlson notifications@github.com writes:

This is made complicated by the fact that we post-process dimensions in transformPicMath function in ODT.hs . This makes things awkward to say the least.

Do you know why it's done this way? Maybe because this is where we actually push the image into the file and can therefore get the real dimensions?

Originally the OpenDocument writer was pure -- could not do IO, including looking at images -- and all the IO operations were put into the ODT writer. Now that we have the PandocMonad interface, we could do it differently and read image dimensions in the OpenDocument writer. If that would make it easier to support proper frames for figures, I'd say it would be worth it.

pyssling commented 5 years ago

This would definitely make it easier. Also easier for anyone reading the code to figure out what's going on. I'll look at the other writers and see if I can figure out how this would work.

iandol commented 3 months ago

This is old and I think not relevant anymore, time to close...