lierdakil / pandoc-crossref

Pandoc filter for cross-references
https://lierdakil.github.io/pandoc-crossref/
GNU General Public License v2.0
947 stars 76 forks source link

crossrefYaml=yamlFilepath not working #286

Closed anlin01 closed 3 years ago

anlin01 commented 3 years ago

Hello, first of all thank you for this wonderful product. I am new to pandoc and pandoc-crossref, and for my thesis which I am writing in LaTeX I need to create a docx-document for an exchange.

I read in #250, #257 that pandoc-crossref is not made to LaTeX-input, and that it is better to walk the way via Markdown. I can do that. However, so far pandoc-crossref seems to work fine.

I discovered that the crossrefYaml=yamlFilepath is not working. I read #242 where a similar issue might be the case.

When using pandoc in the terminal:

pandoc -s -f latex -F pandoc-crossref --citeproc --bibliography=bibfile.bib --csl=cslfile.csl -o output.docx -t docx latexfile.tex

pandoc-crossref.yaml is used automatically and taken from the directory from where pandoc is started.

Since I am using LYX and the export to docx is exexuted from there by the converter-command:

pandoc -s -f latex -F pandoc-crossref --citeproc --bibliography=bibfile.bib --csl=cslfile.csl -o $$o -t docx $$i

I need to specify the location of the files, which works fine except for pandoc-crossref.yaml:

pandoc -s -f latex -F pandoc-crossref crossrefYaml=/home/username/pathdirectory/latextest/converttest/testordner/pandoc-crossref.yaml --citeproc --bibliography=bibfile.bib --csl=cslfile.csl -o output.docx -t docx latexfile.tex

does not work. The message „pandoc: crossrefYaml=/home/username/pathdirectory/pandoc-crossref.yaml: openBinaryFile: does not exist (No such file or directory)“

Did I made a mistake or is the an issue / bug that a path for crossrefYaml is not taken?

I am using pandoc 2.11.2, compiled with pandoc-types 1.22, texmath 0.12.0.3, skylighting 0.10.0.3, citeproc 0.2, ipynb 0.1.0.1

pandoc-crossref v0.3.8.4 git commit UNKNOWN (UNKNOWN) built with Pandoc v2.11.2, pandoc-types v1.22 and GHC 8.10.1

Thank you, Andre

K4zuki commented 3 years ago

I think you are just missing -Mbefore crossrefYaml= so try:

   pandoc -s -f latex -F pandoc-crossref \
-M crossrefYaml=/home/username/pathdirectory/latextest/converttest/testordner/pandoc-crossref.yaml \
   --citeproc --bibliography=bibfile.bib --csl=cslfile.csl -o output.docx -t docx latexfile.tex
anlin01 commented 3 years ago

Thank you very much for your answer, it works fine. Sorry for bothering you - I am still working through LaTeX and conversion. So thank you, it is solved.

If I may ask a question to clarify: as you mention in #250 and #257, pandoc-crossref is not made for LaTeX as input, the principal aim is to add cross-references to Markdown. So you suggest in #250 doing a conversion from LaTeX to Markdown or directly to docx with your (I understood far from perfect) lua-filter. I will try. As far as I can see, the the cross-references (I have only simple text with figures and tables) seem to work even without filter, with LaTeX as input. Maybe it is due to the fact that I am starting ans using a simple document.

My question: Is it correct that there is no proper way existing to create a docx-document from LaTeX?

lierdakil commented 3 years ago

Is it correct that there is no proper way existing to create a docx-document from LaTeX?

Depends on what exactly you mean by "proper". Pandoc will handle simple LaTeX documents on its own -- it will also do its best to replace \refs with the corresponding figure/table number, so if you don't need "advanced" features of pandoc-crossref you don't need to bother with it at all. Additionally employing pandoc-crossref with the LUA filter you mentioned gives you more control over output formatting, which can be rather helpful in some cases. You could also use said LUA filter without pandoc-crossref to convert LaTeX to pandoc-crossref compatible Markdown if you wish.

In general though, LaTeX, unlike most markup languages, is Turing-complete as long as you allow for infinite depth of recursion. As a direct consequence, converting LaTeX to any format is a Turing-complete task, not solvable by the comparatively simple parser framework Pandoc employs. Hence, Pandoc can only handle a small subset of all LaTeX documents. Pandoc-crossref won't help here as well, since it doesn't do anything special with LaTeX input.

In any case, whether Pandoc (with or without pandoc-crossref) will work for your particular use case is something you can only figure out by experimentation (well, either that, or being intimately familiar with Pandoc's LaTeX parser limitations, which I'm guessing you are not)

As a side note, in case you need to produce readable but not necessarily "editable" Word documents from LaTeX, Tex4ht will probably work for more complex LaTeX documents than Pandoc. However, from my -- admittedly limited -- experience, it's way harder to use efficiently or even effectively. That is, you can get good-looking results with tex4ht, but it often can be hard to get any meaningful results at all. Of course, YMMV.

anlin01 commented 3 years ago

Thank you again for your detailed answer.

Yes, it is only possible to figure out solutions by experimentation.

What I discovered: Figures and tables are only numbered if there is a label attached to the floating-object where a figure or table is embedded. It does not matter if the label is used or not for a reference. Only the fact if a label is attached to the floating object decides if the numbering of a figure or table is printed. And the label must exactly correspond to the syntax "fig:.." and "tbl:..". If this syntax in the label is not followed, there is no numbering of the table.

Furthermore a label for the cross-reference must be defined in a separate line like this: \section{name_of_section} \label{sec:name_of_label}

The following one-line-definition is not possible: \section{name_of_section\label{sec:name_of_label}}

That is fine. It is just important to know.

If I may ask you three more things - if possible:

The numbering and description of a figure is printed below the figure.
The numbering and description of a table is printed above the table. Is it possible to control this output format (the position)?

I realized that the levels of sections (section, subsection, subsubsection) are assigned to the levels of sections in the docx-document, which means they are assigned to the patterns / format-templates of docx (libre office, textmaker or ms-word). This is very useful since they are printed in the right predefined format (when a docx-template is given). Is it possible to assign the table-numbering and table-description-name to a pattern / format-template of docx?

Can the list of tables and list of figures be printed by using pandoc? (I guess this question does not belong to pandoc-crossref), so it might be wrong to ask it here.

Thank you, Andre

lierdakil commented 3 years ago

The following one-line-definition is not possible: \section{name_of_section\label{sec:name_of_label}}

Uh... actually, not entirely correct? The issue isn't how many lines there are, the issue is how Pandoc interprets labels inside the section definition (as opposed to immediately after). Also, it's nothing a filter couldn't correct, albeit practically speaking it might end up being a little convoluted. Anyway,

\section{section text}\label{sec:1}

will work exactly the same as

\section{section text}
\label{sec:1}

Furthermore, while putting labels inline might sometimes work, it's not the intended use, so it will occasionally break. TL;DR: even if we ignore Pandoc's quirks, it's a good idea to avoid doing\section{...\label{...}}, and always prefer \section{...}\label{...} instead.

Is it possible to control this output format (the position)?

If it is, I am not aware of it. IIRC we kind of collectively decided that if LaTeX places table captions above tables by default, then so should we. See https://github.com/jgm/pandoc/issues/1641

Is it possible to assign the table-numbering and table-description-name to a pattern / format-template of docx?

As long as you're using the default template or something derived from the default template, caption paragraphs should have Table Caption/Figure Caption paragraph style assigned. But that's about it, no additional run styles are applied. But I guess you could (at least in theory) make a macro in Word to post-process that if you really neede to?

Can the list of tables and list of figures be printed by using pandoc?

Using pandoc alone, no. With pandoc-crossref, yes, but without references to page numbers (which you might expect). Page numbers are a bit of an issue, because those don't actually exist until Word renders the document. So pandoc-crossref can't even begin to guess those. Now, it's not entirely impossible to achieve, but I really, really dislike fiddling with raw OOXML and testing results against multiple Word versions because frankly Word is kinda broken.

Anyway, here's what you can expect: image

See the docs for how to use this.

anlin01 commented 3 years ago

Thank you for your helpful explanation!