lierdakil / pandoc-crossref

Pandoc filter for cross-references
https://lierdakil.github.io/pandoc-crossref/
GNU General Public License v2.0
925 stars 74 forks source link

Listing crossrefs when converting from LaTeX to HTML #319

Open leenamurgai opened 3 years ago

leenamurgai commented 3 years ago

I am converting from LaTeX to HTML5 and trying to use pandoc-crossref version 2.14.02 for cross references to figures, tables, listings and equations. I managed to get it to work for figures and tables. For listings I'm seeing some inconsistent behaviour.

Input LaTeX:

\begin{lstlisting}[caption={caption_text},
                   label=lst:label]
code
\end{lstlisting}

Output HTML5:

<div id="lst:label" class="listing">
<p>Listing chap_num. lst:label: caption_text</p>
<pre label="lst:label"><code>code</code></pre>
</div>

Notice that instead of using the value associated with the label let's say, value(lst:label) it's using the label itself, lst:label. I have set chapters: true in my yml hence the prefixchap_num.. I tried setting

lstLabel: "arabic" / "roman"
setLabelAttribute: true / false

in the yml. None of these made any difference to any of the cross refs (to figures, tables or listings).

In addition, when I make a reference to my listing in LaTeX with \ref{lst:label}, I get

<a href="#lst:label" data-reference-type="ref" data-reference="lst:label">[lst:label]</a>

instead of

<a href="#lst:label" data-reference-type="ref" data-reference="lst:label">chap_num.value(lst:label)</a>

This time the reference is missing chap_num. as well as using the label instead of its value.

This looks like a bug to me but please let me know if I've missed something in the user guide.

lierdakil commented 3 years ago

Not in the user guide, but see https://github.com/lierdakil/pandoc-crossref/issues/250 wrt using latex as input format. I really need to make some sort of FAQ for this thing.

As for the incorrect listing formatting, it's an unfortunate interplay between how Pandoc handles latex and what pandoc-crossref expects as input. Long story short, label attribute is used in pandoc-crossref to override the generated label with something. Which intersects with pandoc trying to preserve original latex label used for cross-referencing. The simplest option is to just filter that out basically, i.e. use a lua filter like this:

function CodeBlock(b)
    if b.attributes.label == b.identifier then
        b.attributes.label = nil;
        return b;
    end
end

(you can just append that to the filter code outlined in #250) and run that before pandoc-crossref, e.g.

pandoc -L/path/to/above/filter.lua -Fpandoc-crossref input.tex -o output.md

... or someting like that

leenamurgai commented 3 years ago

Thanks for your reply on this. It's been super helpful!

Ah, I see. Actually at first did not think I needed to read the section on syntax in your documentation because I thought it only related to the markdown. I came back to it because of this discussion that made me realise others had made it work at least for tables and figures. An extra section for people with LaTeX input in your manual would hopefully mean less repeat issues/discussions like this over bonus functionality :).

I tried the filters you mentioned in #250 and above . For others who might find it useful, I edited the Link function so it only affects listings and equations (and thus avoids interaction with citeproc).

function Link(el)
  if el.attributes["reference-type"]=="ref" and (el.attributes["reference"]:sub(1,#"eq:")=="eq:" or el.attributes["reference"]:sub(1,#"lst:")=="lst:") then
    local citations = {}
    for cit in el.attributes["reference"]:gmatch('[^,]+') do
      citations[#citations+1] = pandoc.Citation(cit, "NormalCitation")
    end
    return pandoc.Cite("", citations)
  end
end

There's one last remaining issue with equation references which have the unwanted prefix eq.. I tried setting eqPrefix: "" (setting lstPrefix: "" worked for listings) but it had no impact?

Thanks again for your help with this!

lierdakil commented 3 years ago

I tried setting eqPrefix: ""

It's a bit of a historical inconsistency, the variable is called eqnPrefix. Worrying about backwards compatibility doesn't really let me rename it, and adding a synonym would more complicate things than help, I fear.

You also might want to set eqnPrefixTemplate to something like ($$i$$) to avoid an unnecessary nbsp and add the traditional parentheses.

leenamurgai commented 3 years ago

I'm glad I asked, no doubt it would have taken me a while to spot that autopilot mode error. In the end I just set

eqnPrefixTemplate: $$i$$
lstPrefixTemplate: $$i$$

since I realised setting the prefix to an empty string was still prepending with an empty space before the reference. The brackets are in the LaTeX, so no need for those.

On the naming, I imagine that it would be a smidge more convenient (when pattern matching) to keep all the prefixes the same length. Also I think eqn is the more commonly used abbreviation1, so if you were so inclined, the synonym route sounds good - change the equation labelling convention rather than the variable name but recognise both. In that case it would be backwards compatible and enable the new (more consistent) naming convention. Just a nudge, in case it works ;).

Thanks again for your help!

1 At least this is the case in applied mathematics circles in the UK - it's the prefix I was using for my equation labels in the LaTeX before I switched to eq for pandoc-crossref.

lierdakil commented 3 years ago

it would be a smidge more convenient (when pattern matching)

Not really, no. However, adding a prefix synonym is doable. I'll think about it.

fedeinthemix commented 3 years ago

@leenamurgai would you mind sharing your complete YAML file? I've tried setting eqnPrefixTemplate: $$i$$, but the output contains $$i$$ (not replaced by a number).

leenamurgai commented 3 years ago

Sure @fedeinthemix. It's short:

chapters: true           # Prepend numbering with "chap#."
eqnPrefixTemplate: $$i$$ # https://github.com/lierdakil/pandoc-crossref/issues/319
lstPrefixTemplate: $$i$$ # 
tableEqns: true          # Typeset eqns and eqn #s in tbl instead of embedding #s into eqns

Did you follow the discussion about the lua filters you need? From #250 and above?

fedeinthemix commented 3 years ago

@leenamurgai thanks for sharing!

I've followed the discussion, but have zero experience with lua, I've created a file named texref.lua

function Link(el)
  if el.attributes["reference-type"]=="ref" and (el.attributes["reference"]:sub(1,#"eq:")=="eq:" or el.attributes["reference"]:sub(1,#"lst:")=="lst:") then
    local citations = {}
    for cit in el.attributes["reference"]:gmatch('[^,]+') do
      citations[#citations+1] = pandoc.Citation(cit, "NormalCitation")
    end
    return pandoc.Cite("", citations)
  end
end

function Math(el)
  if el.mathtype == "DisplayMath" then
    local label = nil
    el.text = el.text:gsub("\\label{[^}]+}", function(w) label=w:sub(8,-2); return ""; end)
    if label ~= nil then
      return pandoc.Span(el, {id=label})
    end
  end
end

function CodeBlock(b)
    if b.attributes.label == b.identifier then
        b.attributes.label = nil;
        return b;
    end
end

and call pandoc as follows: pandoc -s -f latex -t html5 --metadata-file=pandoc.yaml --lua-filter=texref.lua --filter pandoc-crossref --resource-path=images:. --katex -o test2.html test2.tex.

At the moment I don't have a lua interpreter on my system, but according to the manual pandoc should have one integrated.

leenamurgai commented 3 years ago

Yup, I've found myself having to learn some lua (and html and css).

Did you make sure to use the LaTeX label name prefix convention:

Figures:             \label{fig:id}
Tables:              \label{tbl:id}
Code Listings:       \label{lst:id}
Equations:           \label{eq:id}

If it's not that, I found at some point that putting my pandoc options in a yaml file fixed something I couldn't get to work, though I don't recall what it was now.

If none of those work, there's also the pandoc-discuss Google Group.

Best of luck.

lierdakil commented 3 years ago

@leenamurgai would you mind sharing your complete YAML file? I've tried setting eqnPrefixTemplate: $$i$$, but the output contains $$i$$ (not replaced by a number).

If output contains literal $$i$$ string, that likely means that the variable body wasn't parsed as Markdown. This happens with pandoc's --defaults, but it should work with --metadata-file. That said, the behaviour of Markdown parser in --metadata-file was extended in Pandoc 2.7, so check your Pandoc version just in case, you want >=2.7.

fedeinthemix commented 3 years ago

I enter Eq. references as, e.g., \ref{eq:charge}. The pandocversion that I'm using is 2.14.1. I will try using a YAML file. Thanks.

fedeinthemix commented 3 years ago

@lierdakil after your guess that the metadata variable body may not be parsed as markdown I checked how pandoc parses them. Currently metadata is parsed as basic markdown with only the extensions enabled by the input format, see https://github.com/jgm/pandoc/issues/6832

For latex no markdown extensions are enabled. I tried adding the tex_math_dollars extension by pandoc -f latex+tex_math_dollars ..., but the extension is not compatible with the latex reader.

This seems to pose a fundamental limitation in assuming $$i$$ (and similar templates) being always parsed as pandoc markdown.

As a temporary workaround what do you think of suppressing the extra space added when eqnPrefix and other templates are set to ""?

lierdakil commented 3 years ago

@fedeinthemix just use -McrossrefYaml=yoursettingsfile.yaml, it's a dirty hack, but it works.

fedeinthemix commented 3 years ago

Indeed it works! Thank you very much for your patience and help!

leenamurgai commented 3 years ago

Interesting, I literally put everything into a config/pandoc.yml including

metadata:
  crossrefYaml: config/pandoc-crossref.yml

and just call pandoc --defaults config/pandoc.yml.

fedeinthemix commented 3 years ago

I've done something similar as well. The key thing to make it work (for me) is not to use metadata-files: to import the YAML file with the pandoc-crossref options (or the --metadata-file option), but to do it indirectly through crossrefYaml:.