Short captions for use in list-of?

gtuckerkellogg commented 6 years ago

I don't see any way to get a short caption into the list of figures or tables when the caption at the place might be very long. I was thinking of something like this for cross-references items with captions.

![This is a very long caption that would be awkward in a list of
figures but really needs to be used as the caption at the figure
location](img/interesting_fig.png){#fig:interesting shortcaption="This is interesting"}

Not all writers could support it, of course, but those that could should. A question about this type of feature has popped up on pandoc-discuss from time to time.

lierdakil commented 6 years ago

To be honest, I have my doubts about the usefulness of such a feature. Frankly, the only reason pandoc-crossref even has \listofsomething commands is that some editors (I mean, journal editors, like, people who edit journals -- mostly those who don't use or understand LaTeX for some reason) insist that figure/table/etc captions must be provided separately. And in that context, short captions don't really make any sense.

For books etc, that need indices, the only output format that could possibly offer something meaningful is LaTeX-powered PDF. In all other output formats, there's no information about pagination, and so a proper index is just impossible.

gtuckerkellogg commented 6 years ago

I take your point. Here's why I raised it, for what it's worth: dissertations.

I've been encouraging my students to use pandoc for their dissertations, and the university requires \listofsomething for, well, everything. I think the usefulness would be that it would only be triggered when it was needed to people submitting to journals could leave it out, while students submitting dissertations could include it.

Maybe this is a moment where I commit to learning enough Haskell to try to help.

tolot27 commented 6 years ago

BTW: docx also supports short captions in listofsomething.

lierdakil commented 6 years ago

@tolot27, FWIW, pandoc doesn't really support docx all that well, so \listofsomething is generated by pandoc-crossref as a simple list. As always, contributions are welcome if you want to dive into the mess that is OOXML. I personally have neither time nor motivation to do that.

@gtuckerkellogg, contributions are always welcome. But this is going to be rather tricky even for figures (best option being making relevant changes to Pandoc's LaTeX writer, otherwise would have to reimplement significant portions of that in pandoc-crossref). And for other objects, like tables and listings this is going to be... let's just say I get a bit of a headache while thinking about it.

gtuckerkellogg commented 6 years ago

@lierdakil, would this be a possible summer of Haskell project to add to the current thread at pandoc-discuss?

lierdakil commented 6 years ago

For figures, no, not really -- that's relatively trivial, frankly, since images already can have arbitrary attributes and all that needs to be done is choosing a name for "short caption" attribute and modifying the relevant writers (LaTeX comes to mind, not sure about docx/odt) to respect that.

Now that I think about it, I believe code blocks (aka listings) in Pandoc Markdown also can have arbitrary attributes, so those should also be relatively straightforward (although maybe less so, because listings are decidedly less standard than figures)

I should mention though that when I say "arbitrary attributes" what I actually mean is "arbitrarily named string-valued attributes". The problem is with "string-valued" -- that means no formatting, which includes no math. Your call if that sounds acceptable or not. I suspect proposal to extend AST just for "short captions" won't go too well, and that's the only way to have formatting in those too.

For tables, that's primarily tricky because Pandoc's AST doesn't support arbitrary attributes on those at the moment (or any attributes for that matter), and there's been a lot of back and forth about whether it should. This one, maybe together with the other two, sounds like something resembling SoC project, but there's been a lot of back and forth on the subject, so I have my doubts. In any case, it's not my decision, maybe try asking on pandoc-discuss or on https://github.com/jgm/pandoc if that's something they would be interested in. There's also a matter of syntax for specifying table attributes (there isn't any widely adopted one atm), and while {...} seems obvious, the question of where that should go with tables is a bit of a can of worms.

I think that's all the objects in Pandoc's AST that can even have captions, actually. So if you were thinking about lists of something else, that's going to be another thing that would have to be implemented.

Sorry if my grammar is particularly bad, it's mostly due to lack of sleep.

gtuckerkellogg commented 6 years ago

Here's a lua filter that seems to work except for the crossref. Not sure how to get the #fig info out of the data. I'm probably missing some other aspects of the data structure too (e.g., I've dropped all other attributes)

local utils = require 'pandoc.utils'

function Para (figure)
   if FORMAT =="latex" then
      if #figure.content == 1 and figure.content[1].t == 'Image' then
         local img = figure.content[1]
         if img.caption and img.attributes['short-caption'] then
            local short = img.attributes['short-caption']
            local long = utils.stringify(img.caption)
            return { pandoc.Plain {
                     pandoc.RawInline('tex',"\\begin{figure}\n"),
                     pandoc.RawInline('tex','\\centering\n'),
                     pandoc.RawInline('tex',string.format("\\includegraphics{%s}\n",img.src)),
                     pandoc.RawInline('tex',string.format('\\caption[%s]{%s}\n',short,long)),
                     pandoc.RawInline('tex',"\\end{figure}\n")
            }}
         end
      end
   end
end

lierdakil commented 6 years ago

Well, that's nice and all, but that has exactly the problem I was talking about here:

best option being making relevant changes to Pandoc's LaTeX writer, otherwise we would have to reimplement significant portions of that in pandoc-crossref

You're basically reimplementing parts of Pandoc's LaTeX writer here (somewhat poorly, as you admit yourself). It works as a band-aid, but it's not a good long-term solution, not by a long shot. So I would prefer avoiding doing something like that in pandoc-crossref if at all possible -- I do have to consider maintenance cost, and it goes right up the more such hacks are used.

Whether or not this filter works with pandoc-crossref depends entirely on order of invocation. If it runs before pandoc-crossref, latter won't be able to guess that the jumble of RawInlines is actually a figure. So run pandoc-crossref before it, and it should work, more or less.

Some general tips if you want to pursue this filter angle further:

check if figure's title starts with fig: -- that's how Pandoc itself differentiates between images and figures, and without implicit_figures extension, not all Para [Image] are considered figures.
Not sure if lua supports RawBlock, but if it does, that's probably what you want to go with there, instead of a jumble of RawInlines in a Para.

josineto commented 5 years ago

In ODT output, if a figure/table caption contains a hard line-break (using \), when lists are generated only the portion before that break is used (in LibreOffice). This does not solves anything at all in this issue, but can help as a crude workaround.

joelostblom commented 4 years ago

This lua-filter is working well for me together with pandoc-crossref. It allows you to set a short caption inside the {}:

![Caption](file.ext){#fig:label short-caption='my short caption'}

then compile with --lua-filter short-captions.lua.

lierdakil / pandoc-crossref

Short captions for use in list-of? #162