Pandoc doesn’t handle <figcaption> before <img> in <figure>

codingisacopingstrategy commented 5 years ago

Thanks 1000 times for Pandoc!

In HTML, within a <figure>, the order of the<figcaption> and the <img> element can matter, at least for presentation purposes. However, I’m having trouble to get Pandoc to output the <figcaption> first.

I ran into this problem while generating ePubs from HTML input. It’s most easily reproduced by taking HTML input and outputting it as HTML.

Running pandoc -f html -t html on the following snippet:

<figure class="epub-only">
    <figcaption>God (1917) by Baroness Elsa von Freytag-Loringhoven, photographed by Morton Schamberg. CC0 1.0, courtesy of The Metropolitan Museum of Art, New York.</figcaption>
    <img src="images/elsa.jpg">
</figure>

Creates:

<figure>
<img src="images/elsa.jpg" alt="God (1917) by Baroness Elsa von Freytag-Loringhoven, photographed by Morton Schamberg. CC0 1.0, courtesy of The Metropolitan Museum of Art, New York." /><figcaption>God (1917) by Baroness Elsa von Freytag-Loringhoven, photographed by Morton Schamberg. CC0 1.0, courtesy of The Metropolitan Museum of Art, New York.</figcaption>
</figure>

Is there a way around this or is this inherent to the way Pandoc serialises the HTML?

jgm commented 5 years ago

Currently figures are just represented internally as a Para(graph) containing a single Image with a title beginning with 'fig:'.

Nothing in this representation tells you whether the caption came before or after the image in the source.

Actually this is a bit of a hack; we need a proper Figure element, and there's an issue in place for that (#3177). You might comment there if you think it's important to represent the order of the caption.

codingisacopingstrategy commented 5 years ago

Thanks for your reaction! OK I’ll take the time to read through that discussion thread. Generally speaking, I would say this feature is a ‘nice to have’ if it fits in with planned adaptations of the AST. Otherwise, I imagine it’s too much of an edge case to merit changing things around just for this scenario.

jgm commented 5 years ago

Closing because there's nothing we can do about this with current architecture.

jgm / pandoc

Pandoc doesn’t handle <figcaption> before <img> in <figure> #5456