jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

Use markdown image title text when generating list of figures #7915

Open LunkRat opened 2 years ago

LunkRat commented 2 years ago

Currently if pandoc is called with --lof or configured with a format.yml containing:

---
lof: yes

when converting a markdown source, each figure is listed in the LoF using its markdown alt text. This is a problem as others have pointed out in a separate issue because the alt text is also used for image captions, and the text length desired for image captions is very often longer than a short identifying text needed in a list of figures.

Fortunately markdown offers a title text attribute for images which affords a practical and natural solution. The proposal here is to use the markdown image title when generating the list of figures. Usage would look like this:

[My image **caption** can be long with _formatting_.](figures/my_figure.jpg "Figure title for LoF entry")

The title attribute is simple text only (no formatting or math, etc.) which makes it perfect to use in an LoF entry where you don't want that anyway.

This solution allows for a semantic separation between image caption (markdown alt text) and LoF entry text (markdown title text).

jgm commented 2 years ago

I guess my main question is whether it's too much of a limitation if this is confined to being plain string content? (no formatting or math, unless the math is plain text unicode) If not, then this does seem a nice solution, until we get fancier figure support worked out.

LunkRat commented 2 years ago

My understanding is that a list of figures in a document should contain only plain text strings to identify each figure. In my opinion being restricted to plain text is a virtue in this case.

jgm commented 2 years ago

I can imagine wanting short figure titles like:

LunkRat commented 2 years ago

@jgm You make a fair point. Does the need for italics or math in LoF titles outweigh the need for the ability to have figure captions/alt text be independent of LoF titles which this issue is attempting to solve? I argue that formatting LoF titles is a small price to pay in exchange for the ability to write figure captions that are multiple sentences without having them essentially break the LoF by rendering it unrecognizable.

One solution could be to have a conditional that uses the plain text image title attribute if present, otherwise fall back to using the alt text. I don't like this from a usability perspective, but it would allow those folks who want to use formatting in LoF titles to continue to do so using the alt text (would also make this change more backwards compatible).

Thoughts?

tarleb commented 2 years ago

Not a full solution, but for the time being you can use the "short-captions" Lua filter at https://github.com/pandoc/lua-filters/tree/master/short-captions. It only works when going from Markdown to LaTeX and has a few other limitations, but it allows for math in the short caption.

See also #3177, which may have to be completed first.

LunkRat commented 2 years ago

I tried the Lua filter and it works; however, when I combine it with pandoc-fignos from the pandoc-xnos package, the syntax for the Lua short-captions breaks the @fig:[name] in-text figure reference:

[My long alt text figure caption](figures/psychophysics_stimuli.png){#fig:stimuli width=75% short-caption="Psychophysics stimuli"}

The above results in an error from pandoc-fignos when I reference the figure by name with @fig:stimuli:

pandoc-fignos: Bad reference: @fig:stimuli

So while the Lua filter does give the desired behavior, it breaks other desired functionality.

tarleb commented 2 years ago

Make sure that pandoc-xnos runs before the Lua filter. Filters are run in the order in which they appear on the command line.

I wrote an updated, shorter filter that uses new pandoc features and might give better results in some cases:

if FORMAT ~= "latex" then return end

function Para (para)
  if #para.content ~= 1 then return end
  local img = para.content[1]
  if not img or img.t ~= 'Image' or #img.caption == 0
     or img.title:sub(1,4) ~= 'fig:'
     or not img.attributes['short-caption'] then
    return nil
  end

  local short_caption = pandoc.write(
    pandoc.read(img.attributes['short-caption']), FORMAT
  ):gsub('^%s*', ''):gsub('%s*$', '')  -- trim, removing surrounding whitespace

  local figure = pandoc.write(pandoc.Pandoc{para}, FORMAT)
  return pandoc.RawBlock(
    'latex',
    figure:gsub('\n\\caption', '\n\\caption[' .. short_caption .. ']')
  )
end
LunkRat commented 2 years ago

Thank you for the suggestion @tarleb I mistakenly thought I had tried both orders but I tried again and I got your Lua filters to work with pandoc-xnos by ordering my command so that pandoc-xnos filter runs before the Lua filter. I am using the newer, shorter Lua filter you posted on this issue and it works beautifully. So this does indeed solve my immediate need.

I still think it is worth implementing the original issue idea into Pandoc, for two reasons:

  1. It would not require any extra filter to be called.
  2. It would not require extra syntax above standard markdown. Using the Lua filter approach, documents will get littered with short-caption="[...]" which has no meaning or use outside of the specific Lua filter.

So I'm happy to be all fixed up but I still argue that this issue should be implemented as specified. Thank you @tarleb and @jgm for your attention and help!

tarleb commented 2 years ago

Thanks for the feedback, happy to hear that it works. I think we all agree that this should be implemented and become a part of pandoc. It will be easy once support for figures has been improved, and I hope to do that soon.

jgm commented 2 years ago

The reason I'm hesitant to implement this suggestion now is that the plain string limitation seems like a problem. (And it wouldn't be right to parse it as markdown in the writer, because we don't know that the source was markdown.)

LunkRat commented 2 years ago

@tarleb is there a comparable technique available for something like short-caption that could work for Table captions? I'm hoping to clean up my LoT but I see the statement about lack of support for table captions in the Limitations section of the short-caption lua filter README. If you know of any workarounds for this problem please let me know.

tarleb commented 2 years ago

I'm not aware of anything. You could try with commonmark_x instead of the classic Markdown parser and use the attributes extension.

LunkRat commented 2 years ago

I'm using https://github.com/pandoc/lua-filters/tree/master/short-captions for images, works great. However, I still don't have a solution for markdown tables.

I am able to get a short caption for LoT if I use a raw latex table with this syntax:

\caption[My short LoT caption]{My longer caption which appears in the body table caption but not in the LoT.}

Would be great to find a solution for markdown tables, even if it is a workaround/hack and ugly.

jpcirrus commented 1 year ago

@LunkRat have you had a look at the table-short-captions Lua filter? I've not used it so not sure if it will do what you're looking for.

jpcirrus commented 1 year ago

@tarleb until pandoc 3.0+ I have been successfully using your figure short caption filter (thank you), but since upgrading, short captions are ignored. I assume this is due to the support for "complex figures" made in pandoc 3.0. If so, is it still possible to use figure short captions to LaTeX output by mererly amending your filter?

tarleb commented 1 year ago

There are two issues here: the first is that the filter needs updating. It could now be as short as

function Figure (fig)
  local short = fig.attributes['short-caption']
  if short and not fig.caption.short then
    fig.caption.short = pandoc.utils.blocks_to_inlines(
      pandoc.read(short, 'markdown')
    )
  end
  return fig
end

However, this won't work yet as there is a second problem: the LaTeX writer currently ignores short captions. This must be fixed, too.

I'll see to it.

jpcirrus commented 1 year ago

Thank you @tarleb . Appreciated.

jpcirrus commented 1 year ago

I have just upgraded to pandoc 3.1 and tried compiling to latex using this updated filter but the short caption is still not being inserted in the \caption command, so is obviously neglected in the list of figures. When going to the json format I can see short-caption in the output but don't know enough to work out what the issue could be.

jpcirrus commented 1 year ago

After reading the Lua filters manual and many thanks to @wlupton's logging module I have got the filter working after amending it to:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      f.caption.short = pandoc.Inlines(short)
    end
    return f
  end
end
prakaa commented 1 year ago

After reading the Lua filters manual and many thanks to @wlupton's logging module I have got the filter working after amending it to:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      f.caption.short = pandoc.Inlines(short)
    end
    return f
  end
end

Thanks @jpcirrus and @tarleb , I updated the short-captions filter myself but then came across this issue. This is much more succinct, thanks for sharing!

Just confirming that this code as a filter, as well as table-short-captions, means that with pandoc 3.1+ I can use short captions in the list of figures and list of tables

I might reference this issue in a few repos where others may be looking for a similar fix

jpcirrus commented 1 year ago

@prakaa I can confirm that the above code used as a filter ouputs figure short captions, but have no requirement for table short captions so don't know about that. Why don't you give it a go and let us know.

prakaa commented 1 year ago

@jpcirrus clarifying what I meant above:

leowill01 commented 11 months ago

After reading the Lua filters manual and many thanks to @wlupton's logging module I have got the filter working after amending it to:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      f.caption.short = pandoc.Inlines(short)
    end
    return f
  end
end

you just saved my ability to render my dissertation revisions. MANY THANKS

EDIT: this ended up not being able to render markdown or latex expressions in the short captions in the LOF, so after some tinkering with gpt, here is a modified version that supports those as well:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      -- Parse the short caption as Markdown to handle formatting and then convert to LaTeX
      local short_caption = pandoc.read(short, 'markdown').blocks[1].content
      f.caption.short = pandoc.Inlines(short_caption)
    end  
    return f
  end  
end