jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.53k stars 3.37k forks source link

Support of figure label in JATS #7168

Open ehapmgs opened 3 years ago

ehapmgs commented 3 years ago

I am trying to convert JATS to DOCX and I have noticed the labels of the figures are missing.

For example: The output of the following jats is missing what is inside the<label> element

<fig  fig-type="figure">
<label>Figure 1</label>
<caption>
<p>Some caption</p>
</caption>
<graphic/>
</fig>

Is it possible to add support for that?

pandoc version: 2.13

tarleb commented 2 months ago

I think it would make sense to add the content of <label> to the figure caption. This might be unwanted when the target format labels figures automatically, so we may want to hide this behind an extension.

The same should be done for tables.

jgm commented 2 months ago

I don't think we should add the label. This is generally added automatically in most formats that support captions.

tarleb commented 2 months ago

Ok, that's true. So if we were to handle the element, then a "label" attribute would probably be the better choice.

jgm commented 2 months ago

Using a label attribute would be a bit dangerous, because label is an HTML attribute name, so it wouldn't get sanitized to data-label. Could use data-label I suppose. But I'm not convinced we should handle the element at all.

tarleb commented 2 months ago

Or maybe caption-label.

My personal interest here is to have label support in the writer: I have a filter that generates these labels (primarily for HTML), and it would be nice if there was a way to have the JATS writer use that information in a semantically correct way. I could of course write another filter to generate and patch the XML semi-manually, but I'd like to avoid that if possible.

Reader support for labels would certainly be useful when converting to HTML.

bpj commented 2 months ago

How do these labels differ from captions and id attributes? Since they seem to be elements can they contain styled text?

I have a similar use case with glosses — translations/classifications/etymologies attached to words (called lemmas) in texts. I "encode" them as a span inside a span:

[parole['word' f.sg. \< [parabola]{.smallcaps} 'parable']{.gloss}]{.lemma}

For HTML I use CSS to underline (preferably dotted underline) the lemma and make the gloss a styled pop-up which appears when hovering over the lemma, or a margin note and a filter which prepends the lemma in bold to the gloss, or just display it after the lemma in parentheses or not at all for mobile. For LaTeX I use a filter which turns the gloss into a margin note again with the lemma prepended in bold (and using the marginnote package rather than \marginpar to avoid memory issues!) The main fragility is that the CSS and the filters alike depend on the gloss span being the last child of the lemma span.

Perhaps something similar could be done in the caption for this issue.

(Less relevant to this issue is that I also have a filter which will locate spans with the .lemma class and fetch the gloss from a table in metadata — typically loaded with --metadata-file — keyed on the stringified content of the lemma span — provided the last child isn't a gloss span already. It can even keep a sentinel variable which will be true if the same lemma has already been encountered in the current section, in which case it will have the span stripped instead of having a gloss attached!)

tarleb commented 2 months ago

How do these labels differ from captions and id attributes? Since they seem to be elements can they contain styled text?

My understanding of labels is that they usually contain the element name and the number. So "Table 1", "Fig. 4", etc. They are generally presented as part of the caption. HTML doesn't have separate markup for these labels, they are just part of the caption.

In JATS the label can contain markup.