Open GeraldLoeffler opened 11 years ago
+++ blindmelon [Dec 04 15 08:36 ]:
[1]@lierdakil Just weighing in quickly on the matter of references. Reference attributes would make it nice and easy to reference to, e.g. reference the section a table is in rather than than the table itself, or the controversial pageref. Best to make the deep changes now so that it is just the matter of making changes in the readers/writers later on.
A Div container already has an id and arbitrary key/value attributes. So, why isn't that enough?
The id tells you unambiguously which element you mean to refer to. Providing a way to get a section number or page number corresponding to that element is another matter -- but I don't see how a dedicated Figure type would make a difference for that.
Sorry, @blindmelon, I see that you were talking about a separate Reference element rather than a Figure element. I think it would be great to have the power of LaTeX's labels and references. But it would require some deeper changes -- for example, section numbering would have to be integral to the document model rather than (as now) just added by writers.
Anyway, it still strikes me that the issue of a dedicated Figure element and the issue of a dedicated Reference element are separate issues (either could be done independently of the other).
@jgm, not sure I follow. Why? pandoc-crossref makes due without that well enough. Sure it could be better, but not a prerequisite I think.
@lierdakil - sure, I suppose the filter (or whatever) can always reconstruct the section numbering. But one issue is this. If it's not part of the AST that the sections are numbered, then we might end up with references to numbered sections even though section numbers aren't printed. That's awkward, at least!
+++ aaren [Dec 04 15 08:25 ]:
Having Figure Attr [Block] [Block] does feel a bit redundant when we already have Div Attr [Block]. Why not just treat the first Para as the caption? I suppose the Figure caption can have completely arbitrary content (Figures all the way down!), rather than just Para.
One way around this would be to treat the first Para or BlockQuote as the caption. If you wanted multiple blocks, you could put it in a BlockQuote (and the surrounding blockquote would not be part of the caption).
Assuming something like the colon syntax for divs,
::::::: {.figure}
This is my caption.
It has two paragraphs.
:::::::
@jgm isn't that (using numbering or not) the user's problem? If there is no numbering then either the user puts a link over [whatever text they want to use](#ref)
or a filter does something clever and puts in 'the figure above' or similar.
@jgm, another option would be playing with html5 figure/caption convention:
:::::::{.figure}
:::::::::::::{.caption}
Arbitrary block elements
:::::::::::::
![Figures all the way](img.png)
![Etc](img2.jpg)
:::::::
This way, it doesn't matter where this caption is placed, before after or in the middle of things, it's still a caption.
@jgm I see what you mean. I am mostly working with odt and LaTeX output (markdown input, otherwise whats the point?), both of which can do the numbering side of things. My sections, for example, are numbered using styles in my reference odt. At the end of the day I think the writer needs to deal with inappropriate reference types in the input however it is most suitable for the output format.
If we think about references, there are many kinds of numbered things one might want to refer to:
see Example (14) see Equation 15 see footnote 6 see Table 3.1 see Figure 5 see Section 3.2.1 see Code Sample 14 on p. 13
Currently pandoc supports only one of these -- the first, through numbered example lists. Ideally, we could support all or most of them. The questions we need to ask are:
@jgm
{}
and brackets []
. So from my point of view, pseudo-english identifiers make about as much sense as everything else. With pandoc-crossref, I went with prefixes to identify item classes, e.g. fig:
for figures, eq:
for equations etc.
Mind you, that's not required -- in general, you know what element is referenced, so you generally know what kind of element that is.as shown in [@fig:one_figure; @fig:other_figure; @fig:related_figure]
. Cons being it could clash with citation identifiers. I would argue that separate type with similar semantics would be preferable, but citations actually do work fine.At least that's my take on it.
we could support all or most of them. The questions we need to ask are:
What would be good Markdown syntax, both for the references and for the labels? Presumably it would make most sense for the labels to be identifiers, which can already be applied to most anything (either directly or through a Span or Div). But it's less clear what the references should look like. How do we mark whether a reference is to a figure, a table, an enclosing section, etc.? (We want to avoid using English words, and we want to avoid looking like Perl; it should look natural and readable in plain text.)
I currently deal with tables and figures by post processing the output odt (using sed, python and perl all glues together by zsh). The syntax I use in markdown, which I pinched from this thread I think, is {#identifier} either after the table caption or before the image alt (it is what fit the syntax best - although I'm not a fan of the image alt being the caption, tbh). When I want to reference the table or figure I use [#identifier]. I tend to prefix my table identifiers with "T" and figures with "F", which has the advantage of making it stand out of my post-processing goes wonky. It isn't really necessary.
I am strongly opposed to pandoc doing itself what the output format can already do. If the output format has a way of dealing with numbering, it should be used so that the document can be worked on later by non-pandoc users without torturing them. For that reason, I think most of the work naturally goes to the writer. For output formats that don't have numbering (plain-text-ish formats, for example), the writer can just grab a number from an iterator (if they are called that in haskell) and possibly transform it according to convention/user requirements.
One way around this would be to treat the first Para or BlockQuote as the caption. If you wanted multiple blocks, you could put it in a BlockQuote (and the surrounding blockquote would not be part of the caption).
I think that's an interesting idea. But it begs the question: isn't that similar to the hack we already employ for image figures?
Reminder: there are filter-based solutions that can be used while this issue gets worked out. The following implement numbering and references using the syntax advocated by @scaramouche1:
They are python-based and easy to use. Alternatives are provided above by @aaren and @lierdakil.
Note: pandoc-fignos has been updated to work with the new figure attributes syntax that will appear in pandoc 1.16.
Now that pandoc 1.16. is out, is this a bug or pointing at my misunderstanding of the new link_attributes extension?
Converting ![My caption](myfigure.png){#fig:myfigure}
from Markdown to LaTeX, I would have expected
\begin{figure}[htbp]
\centering
\includegraphics{myfigure.png}
\caption{My caption}
\label{fig:myfigure}
\end{figure}
but instead the figure id/label is ignored.
@beinvest good point, must have overlooked that back in the day when I did the image sizes ;) fixed in https://github.com/jgm/pandoc/pull/2637
@mb21 Thanks for the help and your work!!
So, I mused on this for a bit, and here are some questions and ideas, in no particular order
[#id]
seems nice, and semantics should be basically the same as Citation
. It should probably be possible to reuse code that parses citations for references, with little effort.{#someid ref-type=figure}
). Classes are a tougher sell IMO, although syntax is a little cleaner.Just a contribution for a workaround while this issue is open: http://tex.stackexchange.com/questions/139106/referencing-tables-in-pandoc
@lierdakil pandoc-crossref
works great! Thanks for this work!
I'm adding the pandoc-2.0 milestone so we at least think about whether to add some of these features to standard pandoc. (I'm using pandoc-crossref now and it works very well indeed.)
It would be nice though if either pandoc or pandoc-crossref support the auto-identifiers.
@ibutra, not sure what you mean by 'auto-identifiers' exactly.
Manual: The second entry named auto_identifiers is what I mean, basically the identifier given by pandoc on default if none is given manually for referencing
@lierdakil I think @ibutra is referring to how Pandoc can auto generate section reference tags from the heading text (crossref already supports it for headings, with caveats).
I can see the appeal, for example I end up following the pattern:
![Plot text](../fig/plot_filename){#fig:plot_filename}
It could be an idea to generate a tag fig:plot_filename
if one isn't explicitly given. Might be a bit unnecessary though (I just added an editor snippet to generate the pattern) but on the other hand, why not?
In headers, the identifiers are generated from the header text. The analogue in a figure or table would be to generate them from the caption text -- but this is likely to be too long and cumbersome. Still, it might not hurt to generate them; one always has the ability to specify an identifier manually.
I specifically meant the headers though the same feature for figures and tables would be nice too.
What I didn't know @mangecoeur is that pandoc-crossref already supports this?
@ibutra, from https://github.com/lierdakil/pandoc-crossref#section-labels
You can also use
autoSectionLabels
variable to automatically prepend all section labels (automatically generated with pandoc included) with "sec:". Bear in mind that references can't contain periods, commas etc, so some auto-generated labels will still be unusable.
Generating labels for figures/tables/other has another drawback. Right now, the default behavior in pandoc-crossref is to ignore unlabelled elements (since this is least intrusive), so
![Caption](image)
will be an unnumbered (or rather, unprocessed) figure.
This kind of behavior is useful for informal writing, when you don't need to number the figures you're not referencing. Also for running pandoc-crossref
on documents that don't need cross-referencing at all, f.ex. from an automated script.
@jgm, for figures, a better (more concise) source of auto identifiers is probably not a title, but a filename (or rather, basename). Tables and listings are another matter, and I don't think it's feasible for math.
For RST the syntax should be much easier. Just use the already-available name field:
.. figure:: image.png
:name: example
:alt: an image
This is the caption
see Example (14) ... see Figure 5
FWIW, the Markdown should not include the caption type text (e.g., "Equation", "Table", "Figure") as that is presentation logic. That is, without changing the source, it should be possible to replace "Figure" with "Illustration" throughout the output document.
Here are a few others, which suggests that the solution should be caption type agnostic. The complete set of possible captions is fairly long and we probably shouldn't try to restrain the syntax to a particular subset as some could get missed, such as:
see Listing (14) see Algorithm 5
Thus with the text, As seen in Figure @fig:force
, the word "Figure" is redundant (the @fig
already signifies the caption is a figure). With that particular syntax, As seen in @fig:force
allows the rendering component (e.g., LaTeX, ConTeXt, etc.) to determine what caption type text to inject, if any.
The above is also helpful when referencing multiple items, for example
As shown in @fig:a;@fig:b
=> As shown in Figures 1 and 2
and ranges
As shown in @fig:a;@fig:b;@fig:c
=> As shown in Figures 1–3
Hopefully, if this is built into core pandoc the docx
could gain the ability to output 'real' reference fields (using the office xml reference tags). This would allow you to post-process fields in Word, for example to generate tables of figures and tables of tables (Word can generate these when caption fields are used).
Here it goes again... How does one cross-reference figures in Pandoc?
Thanks!
For the moment you should use filters, either pandoc-crossref (installs via homebrew if you use a Mac: brew install pandoc-crossref
) or pandoc-fignos (you need a working python install). Personally I do all my writing in Scrivener, which has its own crossref system that outputs to Pandoc so don't use these myself.
It would be great if pandoc by default would support adding image/figure IDs and cross references when converting markdown to docbook. This would ensure the software needed is available in Debian.
I am currently typesetting a set of books using a Markdown->Docbook pipeline, and need a way to reference figures in the text.
@jgm is there any progress an this inside the main tree, or would you suggest using the filter pandoc-crossref?
for now, use pandoc-crossref
I hope there was an option to interpret links as numberings instead of hyperlinks, for but not limited to non-electronic media. Something like ![Figure fig#. Caption](/path/file.png) and [fig.](Figure fig#. Caption)
, stripping all automated stuff except the numbering.
I am pleased to announce the 2.0.0 release of the pandoc-xnos filter suite:
The filters emerged from recommendations made by the community in this thread, and in particular this post by @scaramouche1.
For the label/ref problem, labelling itself is pretty simple: they're just opaque identifiers, though some document systems (like some LaTeX packages and the existing reference-providing filters) have prefixed labels like thm:thing
. (Incidentally, my preference is for future Markdown syntax not to require any internal structure on labels, beyond, say, what citations already require).
Numbering things and rendering references to them, on the other hand, strongly resembles the process of generating citations and bibliographies, and the ways that can be done vary almost as widely. Typing of numbered things, choosing how to insert numbers in titles, reference prefixes, configuring numbering with counters, modifying counters in the text, and automatically generating identifiers can all be supported and configured.
So It will be hard to choose exactly how Pandoc will number things and render references, and what configuration will be allowed. It could be as complex as LaTeX, but I'm not sure if that complexity is welcome in pandoc
itself (maybe it is?). The Markdown syntax for refs will also have to be chosen, though I imagine it will operate somewhat like the citation syntax does currently, judging from the discussion in the thread above.
Ideally, the intermediate representation would be modified so that in principle a filter could perform numbering and reference rendering like pandoc-citeproc
does for citations, potentially more complexly than pandoc
itself would. This can be done without settling the other issues. In the simplest design, labels (as identifiers) and numbers (if at all) can be stored in the Attr
that we have now, requiring no IR change there. References should get their own element, and based on the current Citation
type, the following could work:
-- Support for labelling more things can be added by adding Attr to more types.
data Inline
= ...
| Ref [Reference] [Inline]
...
-- Might want to record whether or not it's a page reference for
-- paginated formats like TeX.
data Reference = Reference
{ referenceId :: Text
, referencePrefix :: [Inline]
, referenceSuffix :: [Inline]
, referenceMode :: ReferenceMode
, referenceHash :: Int
}
-- The main modifier of a reference at the reference site itself
-- is how to render a prefix, if at all.
data ReferenceMode
= UpperCasePrefix
| LowerCasePrefix -- may not be needed?
| SuppressPrefix
| NormalReference
The intent is to support using Ref
like Cite
is right now in the readers, to store a sequences of references from a compound reference and the text of what was parsed.
Slightly off-topic, but I have no idea what the citationNoteNum
in Citation
does. I'm not sure if it's used at all in the core pandoc
packages. What is it for?
If numbers (meaning the full rendered number, like "2.4.1") were stored in the Attr
of the numbered thing (to expose them to other filters), it would be wise to agree on a particular key for them. Having it be number
is the easiest, I guess.
We already use number
in sections (after makeSections
), so yes, I agree on that.
With Ref
, I guess your idea is that the Ref
elements will be postprocessed by a filter or built-in transformation, as Citations are now. The [Inline]
part will be replaced by the rendered reference. That makes a lot of sense to me.
citationNoteNum
-- I don't think it is used. In pandoc-citeproc the citeNoteNumber
is taken from it, but since (as far as I can see) it's always 0 this never makes any difference. This type originates from citeproc-hs and probably needs some adjusting, especially as I go forward with the new citeproc processor. I can see why a field such as this would be needed. Some styles include back references like "Op. cit., n. 13" where you have to know the note number in which a particular citation occurs. In my current citeproc implementation, we get these numbers by assuming one note per citation -- but of course that breaks if you have a document containing both citations and footnotes, and you're using a footnote citation style. In that case we'd need some way for pandoc to tell citeproc, "This citation would be the Nth rendered note." I see no reason why we can't simply use the existing field for this -- it's probably what it was intended for.
Yes, internal references are enough like citations that I thought the same sort of representation and handling would be good, since Cite
seems to work well in practice.
I think for the writers that didn't support citations (all of them initially), the fallback would be exactly what the fallback for Cite
is now: just attempt to render the [Inline]
content if possible.
If citationNoteNum
is intended for that purpose, then there probably won't be any need for the analogous referenceNoteNum
. I'm not sure I've seen an ibid. used with a reference before.
For some internal manuscripts I've put together a Lua filter that handles most cross references.
It currently assigns IDs to tables and equations based on attribute blocks at the end of the caption (i.e. : Caption for this table {#tab:example}
) and surrounding spans for equations ([$$a^2+b^2=c^2$$]{#eq:pythagoras}
).
In the next step, citations starting with a prefix (fig:
, tab:
etc.) are replaced with a link to the element or natively counted references (LaTeX + docx).
It's not meant as serious competition to the excellent pandoc-xnos, but rather as testing ground for new features (i.e. table attributes) and pandoc-xnos compatible implementation for the most basic needs.
I tried to summarize current out-the-box pandoc LaTeX → docx experience in the question at StackOverflow. With test document and pandoc simple.tex --to docx --output simple.docx --table-of-contents --toc-depth 5 --number-sections --citeproc --verbose --csl ieee.csl
command I obtained the following docx-rendering:
I see many strange things:
[fig:image]
instead of Figure with number;[tab:table]
instead of Table with number;[eq:eq]
instead of equation number;[exm:code]
instead of example number.Hope you will provide official out-the-box pandoc solution for it without third-party filters and so on. Do we currently have a solution, which I probably missed?
@N0rbert one thing you're missing is the native_numbering
extension. (The reason this isn't enabled by default is that it interferes with the popular filter pandoc-crossref.) If you do -t docx+native_numbering
, then the situation improves a little bit: you get
Figure 1: [fig:image] Image
Table 1: [tab:table]
There's some low-hanging fruit here:
[tab:table]
when native_numbering
is specified. And maybe we should get of it in general for tables and figures, since we're getting a number for the \ref{}
even without native_numbering
.example
environment; after all we're creating a number for it.native_numbering
could be enabled by default; we need some other way to work around the problem described in #7499. (If pandoc can recognize when an external filter has already added the number, it can avoid doing so in that case. Perhaps a crude way to achieve this would be if pandoc-crossref added something to metadata that we can check. @lierdakil @jjallaire any thoughts?)Of course, that still leaves us without good references to numbered equations (indeed, without numbered equations).
Yes, we could establish a protocol where filters set a specific metadata value to indicate that they have already handled numbering. Maybe for consistency w/ native_numbering
we could set filter_numbering
or filter-numbering
(or filter_numbered
, filter-numbered
, etc.)
Thank you for quick reply. With -t docx+native_numbering
document looks better.
I'll keep an eye on next releases to check the changes provided by last two mentioned commits.
Thanks!
I'm using -t docx+native_numbering
but the rendered docx file still does not contain any reference when using the \autoref{whatever-(equations, figures, sections, etc)}
With latest pandoc 3.1.2-1 on upcoming Debian 12 only equations are not numbered - resulting document has [eq:eq]
.
See below image:
Thanks!
It's currently possible to include internal links to sections. I'd like to propose a similar feature for links to figures/images and tables.
It may make sense to provide this feature only if the figure/image or table that is being linked to has a caption. In that case Pandoc can today automatically generate a number for the figure or table and include it in the caption, e.g. "Figure 15".
At the most basic, the text of the link would be provided by the user, as is currently the case for links to sections.
Of course it would be very convenient if the automatically generated number for the figure or table would also be used for the text of the link, e.g. "as can be seen in Figure 15, blah", where "Figure 15" would be the internal link whose text is auto-generated from the figure it points to.