jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.23k stars 3.3k forks source link

New Feature: internal links to tables and figures and headers #813

Open GeraldLoeffler opened 11 years ago

GeraldLoeffler commented 11 years ago

It's currently possible to include internal links to sections. I'd like to propose a similar feature for links to figures/images and tables.

It may make sense to provide this feature only if the figure/image or table that is being linked to has a caption. In that case Pandoc can today automatically generate a number for the figure or table and include it in the caption, e.g. "Figure 15".

At the most basic, the text of the link would be provided by the user, as is currently the case for links to sections.

Of course it would be very convenient if the automatically generated number for the figure or table would also be used for the text of the link, e.g. "as can be seen in Figure 15, blah", where "Figure 15" would be the internal link whose text is auto-generated from the figure it points to.

aaren commented 9 years ago

@mb21 that's great! do you have a PR we can try out?

I'm using \autoref in my filter but am tempted to use \ref instead. I've also got a generic output format that will just put in Figure 1 or whatever.

I started out using [](#ref) as the in-text link, for the reasons that you give. Then I used [#ref], because it's less typing and internal references are a bit like implicit reference links anyway. Now I'm using #ref, because it's even less typing and the parallel with citations. Scholdoc uses either of the last two.

In the latter two cases, links are only made if there is something defined with the link on it. You can still write e.g. #1, as long as there isn't something with 1 as a label.

Me and @timtylin had a discussion about [#ref] vs. #ref and \autoref vs \ref on timtylin/scholdoc/issues/3.

Finally, note that the characters in latex labels are limited. I'm not sure if there is an equivalent restriction in html.

mb21 commented 9 years ago

@aaren Yes, this is the pull request, as you can see from the discussion there it's not quite finalized yet though.

I don't have strong opinion on [#ref] vs #ref vs [](#ref). However the last one would enable the filter to work on every input format that can have empty links, although I guess you can write everywhere #ref in plain text as well.

Haven't had time yet to look closely at your filter, but sounds great!

About the limited characters, there was already an existing toLabel function that is used by header ids.

jgm commented 9 years ago

+++ mb21 [Dec 14 14 03:34 ]:

I like aaron's idea of overloading the empty link syntax for that ([](#link)), since I cannot image ever actually wanting an empty link.

Note that gitit overloads empty links as wikilinks.

bpj commented 9 years ago

I have recently written a filter which overloads empty link texts in yet another way: it alters the AST so that the plain writer produces what looks like perldoc POD markup, with the convention that you can put strings in POD link syntax in the link title to get a perldoc link, and an empty link text is automatically expanded to "the Foo::Bar module" or something like that depending on whether the title is prefixed with "pod:" (for internal links), "cpan:", "perldoc:", or "man:" (for manpage links) (While a companion filter will expand a zero as the URL text into a link to the appropriate web site.) and I can well imagine still other ways to overload empty link texts, so I very much think such overloading should be left to filters. Den 14 dec 2014 17:05 skrev "John MacFarlane" notifications@github.com:

+++ mb21 [Dec 14 14 03:34 ]:

I like aaron's idea of overloading the empty link syntax for that ([](#link)), since I cannot image ever actually wanting an empty link.

Note that gitit overloads empty links as wikilinks.

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/813#issuecomment-66917835.

scaramouche1 commented 9 years ago

The lack of internal cross-references is the major stumbling block for Pandoc's "world domination" in academia. This thread has made great progress at discussing the problem and providing solutions for cross-referencing figures.

But the problem of cross references is more general, and thus it makes sense to try to solve it in a more general way. For instance, mathematical documents cross-reference theorems and equations, social science cross reference hypotheses, philosophy cross references examples, and most disciplines cross reference section numbers. For Pandoc to be really helpful to all these disciplines, a general solution must be devised.

I propose a slight modification to the # and @ notations to allow for all types of cross-references. Here's the informal specification:

Anchors (#) and references (@) can optionally include a type descriptor. For instance, #fig:cat, #eq:force, #sec:intro, etc. The descriptor is what's in between '#' and ':'. Example uses:

Referencing a figure

![This is a cat](cat.png) {#fig:cat}

As seen in Figure @fig:cat.

Referencing an equation

$$F = ma$$ {#eq:force}

As seen in Equation @eq:force.

Referencing a hypothesis

Hypothesis {#hyp:temp}: Temperature increases in summer.

As mentioned in Hypothesis @hyp:temp.

This notation also serves to cite theorems, proofs, etc. (@thm:, @proof:).

Referencing a section number

# Introduction {#sec:intro}

As mentioned in Section @sec:intro.

This is in addition to the implicit referencing of headers already in place.

How to deal with possible clash with bibliographic references

There proposed notation has a small clash with the notation used to cite bibliographic references. I propose that references that include a ":" are first searched inside the document, and only if there's no internal match they are searched in the bibliographic system. Alternatively, ':' could be forbidden for bibliographic references.

jgm commented 9 years ago

I like the idea of separate numbering sequences for different kinds of things. And using a prefix with a colon for that is fairly sensible.

The use of @ is problematic, as @ is already used for citations and for numbered examples in pandoc. (Though perhaps this mechanism could replace the current mechanism for numbered examples: {#ex:foo}, @ex:foo?)

One thing that might be missing is control over the numbering schemes used. For example, you might want figures to number with a prefix by chapter or section: e.g., in chapter 2, figures are numbered 2.1, 2.2, etc. It would be nice to be able to control that somehow.

Another question is how this would be implemented. Would the pandoc AST contain label and reference nodes? Or would the Markdown reader simply convert these to hyperlinked numbers (simpler)?

aaren commented 9 years ago

@scaramouche1 why not use # instead of @? i.e.

![This is a cat](cat.png) {#fig:cat}

As seen in Figure #fig:cat.

This way you avoid the conflict with citations. I find it easier to use as well because of the cognitive separation between cross references and citations.

Thinking even further forward (much much further!), you could end up doing something like

As seen in Figure `@somepaper#fig:cat`

to reference a figure in another document.

aaren commented 9 years ago

@jgm I could imagine more advanced cross referencing being implemented by an external filter (as for citations with pandoc-citeproc). Configuration of the numbering etc. would then be done through metadata.

I think the AST implementation depends on how we conceive of an internal reference - is a hyperlink sufficient to describe it? I'm not sure on this.

scaramouche1 commented 9 years ago

@jgm, @aaren: I don't know enough about the internals of Pandoc to be of much help with implementation ideas. Here, I just include how this could be translated into LaTeX.

When exported to LaTeX, 'fig:', 'eq:', and 'sec:' references can be translated in the standard way (\label for the anchor and \ref for the reference), and all other reference types can use plain counters (1, 2, 3, ...). In LaTeX these per-type, plain counters can be implemented as follows (here's an example for 'hyp:'):

%in preamble: create counter and anchor for 'hyp'
\newcounter{ctrhyp}
\newcommand{\anchorhyp}[1]{\refstepcounter{ctrhyp}\arabic{ctrhyp}\label{#1}}

\begin{document}
Hypothesis \anchorhyp{hyp:temp}: Temperature increases in summer.

As seen in hypothesis \ref{hyp:temp}.
\end{document}

Making this work with other output formats probably requires programming a "counter" module in Pandoc that can deal with a few counting schemes. As a starting point, rather than making the schemes user-configurable, it may be better to copy the LaTeX defaults (i.e., sections use nested counters, and everything else use plain counters). In the future, alternative counting schemes could be specified in YAML.

scaramouche1 commented 9 years ago

@jgm: For what is worth, I believe it is OK to replace the current example mechanism (@) with the proposed, more general cross-referencing functionality. The few documents that use (@) would need to be updated to use '@ex:', but a huge number of use cases could be allowed without changing Pandoc's syntax substantially. This functionality would make Pandoc the best way to work on academic papers in any discipline.

@aaren: I believe it is better to use '@' than '#' for references: a cross-reference is a citation to a part of a document. Thus, cross-references and citations are conceptually similar---they are pointers to an object---and could use the same '@' symbol. Using '#' would also be OK, but at the expense of a slight increase in cognitive load (because '#' is currently used to denote anchors, not references).

sjackman commented 9 years ago

:+1:

mangecoeur commented 9 years ago

It seems to me that including the the references in the AST would be more powerful and future proof, especially wrt pdf/latex conversion - you would be able to tweak the output to work with the styling system. I think control over the numbering scheme would have to be through the template - though not sure how you could define rules in a flexible but still reasonably simple way

On 6 Jan 2015, at 19:32, John MacFarlane notifications@github.com wrote:

I like the idea of separate numbering sequences for different kinds of things. And using a prefix with a colon for that is fairly sensible.

The use of @ is problematic, as @ is already used for citations and for numbered examples in pandoc. (Though perhaps this mechanism could replace the current mechanism for numbered examples: {#ex:foo}, @ex:foo?)

One thing that might be missing is control over the numbering schemes used. For example, you might want figures to number with a prefix by chapter or section: e.g., in chapter 2, figures are numbered 2.1, 2.2, etc. It would be nice to be able to control that somehow.

Another question is how this would be implemented. Would the pandoc AST contain label and reference nodes? Or would the Markdown reader simply convert these to hyperlinked numbers (simpler)? — Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/813#issuecomment-68909234.

mb21 commented 9 years ago

Another question is how this would be implemented. Would the pandoc AST contain label and reference nodes? Or would the Markdown reader simply convert these to hyperlinked numbers (simpler)?

A third option would be to have a filter (similar to the citeproc-filter) make the conversion to hyperlinked numbers. The filter would be enabled by default for the markdown reader. That way, the pandoc-types wouldn't need to be changed but the feature wouldn't be restricted to markdown input. @jgm Are there any downsides to this way? Or do you not want the three-stage process (reader -> filters -> writer) to become the norm when not using custom filters?

references in the AST would be more powerful and future proof, especially wrt pdf/latex conversion - you would be able to tweak the output to work with the styling system.

@mangecoeur, could you elaborate on that? I don't see what you're getting at.


how we conceive of an internal reference - is a hyperlink sufficient to describe it?

Can anyone come up with an example use case where a hyperlink isn't enough? I for one, cannot.


For the case where the target is inline text (and not a block like a figure, table etc.), I would prefer the syntax to be the often-discussed span-syntax:

[Hypothesis]{#hyp:temp}: Temperature increases in summer.

instead of:

Hypothesis {#hyp:temp}: Temperature increases in summer.

While a tad more cumbersome to write, it makes conceptually much more sense to me, since the attribute is then on a span element instead of floating around in nowhere. This would generate the HTML:

<span id="hyp:temp">Hypothesis 1</span>: Temperature increases in summer.

I agree that it makes sense to generalize the example list numbering, so you'd have fig:, eq:, hyp: etc. and also ex:. You would reference an example like: As @ex:good illustrates, ... The trickier part is how to write the example list itself. We could of course stick with the current syntax, but an arguably more consistent syntax would need attributes on list items, something like:

- This is a good example. {#ex:good}
- This is a bad example. {#ex:bad}

As @ex:good illustrates, ...

However attributes on list items are really hard because how do you know whether the attributes are on the last list item, the entire list, or even on the last paragraph of the last list item? A possible solution is to simply look out for span tags (that have an id) in example lists:

(@) This is a good [example]{#ex:good}. Do it like that.
(@) This is a bad [example]{#ex:bad}. Do not do it like that.

As @ex:good illustrates, ...

Or similarly:

- This is a good [example]{#ex:good}. Do it like that.
- This is a bad [example]{#ex:bad}. Do not do it like that.
{.example-list}

As @ex:good illustrates, ...
jgm commented 9 years ago

+++ mb21 [Jan 17 15 04:54 ]:

Another question is how this would be implemented. Would the pandoc AST contain label and reference nodes? Or would the Markdown reader simply convert these to hyperlinked numbers (simpler)?

A third option would be to have a filter (similar to the citeproc-filter) make the conversion to hyperlinked numbers. The filter would be enabled by default for the markdown reader. That way, the pandoc-types wouldn't need to be changed but the feature wouldn't be restricted to markdown input. @jgm Are there any downsides to this way? Or do you not want the three-stage process (reader -> filters -> writer) to become the norm when not using custom filters?

There's a performance downside, I imagine. One would have to measure the time it takes to walk the tree and do this kind of transformation vs the time for parsing itself. If it is much smaller, then it may not matter so much.

For the case where the target is inline text (and not a block like a figure, table etc.), I would prefer the syntax to be the often-discussed span-syntax:

[Hypothesis]{#hyp:temp}: Temperature increases in summer.

instead of:

Hypothesis {#hyp:temp}: Temperature increases in summer.

While a tad more cumbersome to write, it makes conceptually much more sense to me, since the attribute is then on a span element instead of floating around in nowhere. This would generate the HTML:

Hypothesis 1: Temperature increases in summer.

I believe this is the recommended best practice for inserting targets in HTML -- putting ids on real elements instead of adding anchors.

I agree that it makes sense to generalize the example list numbering, so you'd have fig:, eq:, hyp: etc. and also ex:. You would reference an example like: As @ex:good illustrates, ... The trickier part is how to write the example list itself. We could of course stick with the current syntax, but an arguably more consistent syntax would need attributes on list items, something like:

- This is a good example. {#ex:good}
- This is a bad example. {#ex:bad}

As @ex:good illustrates, ...

This seems a bit confusing to me, since it looks like a bullet list. So, I'd probably prefer

(@ex:good) This is a good example.

That makes it clearer from the text itself that you're referring to a numbered list item.

scaramouche1 commented 9 years ago

I am glad that a consensus is starting to form around the syntax of cross-references.

In this post I try to formalize a little bit more the syntax by discussing boundary cases and parsing and rendering details. My aim is to help the implementors, by making sure beforehand that the syntax is useful in a broad range of situations and its semantics are unambiguous.

Anchors and references

Anchors label an object in a document. References point to an anchor.

Anchors

Anchors have the following syntax:

{#type:[descriptor]}

The descriptor part is optional. These are examples of valid anchors:

{#ex:good}, {#ex:}, {#hyp:temp}, {#eq:force}, {#eq:}, {#sec:intro}

Anchors without a descriptor (such as {#ex:} or {#eq:} above) are useful for cases in which one will not refer back to an anchor, but wants the anchor to show up in the document. For instance, an author may want to have several equations or examples numbered, even if these won't be cross-referenced. In such cases, the author would prefer not to waste time creating unique labels.

Types and descriptors can only contain alphanumeric characters plus "_" and "-" (i.e., [A-Za-z0-9_-]+).

It is suggested that authors use mnemonic types (e.g., "sec" for sections, "eq" for equations), but this is not mandatory.

References

The syntax of references is similar to the syntax of anchors, except that: (i) the curly braces are optional, (ii) references include an "@" instead of a "#", and (iii) the descriptor part is not optional. Thus, their syntax is:

@type:descriptor       or       {@type:descriptor}

Examples of valid references are:

@ex:good, @hyp:temp, @eq:force, @sec:intro
{@ex:good}, {@hyp:temp}, {@eq:force}, {@sec:intro}

Examples of invalid references are:

@ex:, {@eq:}

An invalid reference is rendered in a document in an easy to notice way (e.g., "???").

What distinguishes a document cross-reference from a bibliographic reference is that document cross-references must include a ":".

Standard and special anchors

It is customary to render some anchors in special ways. For instance, equation numbers are put in parenthesis and right justified, while section numbers appear before the section heading.

Pandoc would detect these special anchors depending on the context where the anchor appears. For instance if an anchor follows an equation, this anchor will be deemed "special" and it will be formatted accordingly.

The special anchors are the following: equations, headings, and figures. Examples of these anchors:

$$ F = ma $$ {#eq:force}
$$ x = 1 $$ {#eq:}
# Introduction {#sec:intro}
![This is a cat](cat.png) {#fig:cat}

Special anchors are rendered according to style-specific rules (initially these rules could match LaTeX defaults, but in the future they could be user-configurable; see initial ideas on how to configure this at the end of the document). The previous examples could be rendered as follows:

       F = ma           (1)
       x = 1            (2)
1 Introduction
       [IMAGE]
Figure 1: This is a cat.

Any anchor that is not special is a "standard" anchor. Examples:

Hypothesis {#hyp:temp}: temperature increases in summer.
Proof {#proof:equal}: 1 = 1
({#ex:good}) This is a good example.
({#ex:}) This is a numbered, but not referenceable example.

Standard anchors are rendered as auto-increasing counters. Each anchor type is associated to its own counter (which starts at 1). For instance, the previous examples would be rendered as follows:

Hypothesis 1: temperature increases in summer.
Proof 1: 1 = 1
(1) This is a good example.
(2) This is a numbered, but not referenceable example.

Rendering references

References to defined anchors are rendered as the number of the corresponding anchor.

For instance, the following references

As seen in Equation @eq:force
As described in Section @sec:intro
As observed in Figure @fig:cat
As suggested in Hypothesis @hyp:temp
As demonstrated in proof @proof:equal
As shown in example @proof:equal

Would be rendered as:

As seen in Equation 1
As described in Section 1
As observed in Figure 1
As suggested in Hypothesis 1
As demonstrated in proof 1
As shown in example 1

Note that the rendered reference does not include any extra text apart from the number of the reference. That is, "@fig:cat" renders simply as "1", not as "Figure 1" or "Figure (1)". This is because more automation conflicts with some referencing needs. For instance, from time to time authors may need to write things such as:

As shown in figures 1--5.
As suggested in H1.
As seen in Equation (1).

Using the current notation, these can be accomplished in the following ways:

As shown in figures {@fig:cat}--{@fig:dog}.
As suggested in H{@hyp:temp}.
As seen in Equation (@eq:force).

References with long hyperlinks

Normally only the number of the reference is hyperlinked to the position of the anchor. For instance, "Equation @eq:force" is rendered as "Equation 1" where the "1" is a hyperlink to the corresponding equation.

If one wants the whole "Equation 1" to be the hyperlinked one can write: "[Equation ]@eq:force, "[Equation ]{@eq:force}".

Thus, text in square brackets immediately followed by a reference shares the same hyperlink as the reference.

Translating to LaTeX

LaTeX contains environments to render the special anchors. Thus:

Standard anchors are rendered by creating a new command in the preamble. For instance, the following implements a counter for hypotheses:

%in preamble: create counter and anchor for 'hyp'
\newcounter{ctrhyp}
\newcommand{\anchorhyp}[1]{\refstepcounter{ctrhyp}\arabic{ctrhyp}\label{#1}}

%in document body: create anchor and refer back to it.
Hypothesis \anchorhyp{hyp:temp}: Temperature increases in summer.

As seen in hypothesis \ref{hyp:temp}.

References with long hyperlinks are linked using the hyperref package. For instance "As seen in [hypothesis ]{@hyp:temp}" is rendered as

As seen in \hyperref[hyp:temp]{Hypothesis \ref*{hyp:temp}}.

Translating to HTML

HTML does not have predefined mechanisms to deal with numbering, thus all of the numbering is done by Pandoc in a way that mimics the default LaTeX formats.

Other ideas

Here I include a couple ideas that are related to the cross-reference system.

Non-standard position of the references

In some documents, the References section does not appear at the end. For instance, many papers include appendixes after the references. One could choose a non-standard position for the references by adding an the anchor "{#references:}" For instance,

# Introduction
....

# References {#references:}

# Appendix
....

Custom numbering schemes

In the future, it may be nice to be able to customize the numbering schemes used by the different reference types. This could be done from a YAML section. Here's an example of how it could be used:

format: #sec:1.A.i
format: #eq:1.1
format: #fig:1-a
format: #hyp:A
format: #ex:1

This would number things as follows:

1 Section
  1.A Subsection
      Equation 1.1
      Hypothesis A
      Hypothesis B
      Example 1
      Example 2
      1.A.i  Subsubsection
             Equation 1.2
             Figure 1-a
      1.A.ii Subsubsection
             Figure 1-b
  1.B Subsection
      Hypothesis C
2 Section
  Figure 2-c
  Example 3
  ...
elcritch commented 9 years ago

This has been a good conversation to follow. I agree with @scaramouche1 that I cannot currently use Markdown for technical articles beyond informal group writing and some papers for coursework. I have been developing my workflow more around Markdown and lightweight text editing, and this has been a major hinderance (the other being able to quickly "tag" custom attributes readily).

This seems a bit confusing to me, since it looks like a bullet list. So, I'd probably prefer (@ex:good) This is a good example. That makes it clearer from the text itself that you're referring to a numbered list item.

@jgm, were you proposing that the syntax for numbered examples and general anchors could be bridged by placing the anchor at the start of the list item? Like:

- (@ex:good) Great idea!
- (@ex:bad) Ok idea. 
- (@ex:ugly) Really bad idea. 

It seems that both features could co-exist for a while with the older syntax being deprecated at some future point.

@scaramouche1, the write up you did helped me reason about whether it this syntax would be useful and generic. Thanks! It covered almost all the questions I had, except how anchor would be associated with generic blocks of text. Going from what @bpj mentioned, I am not sure how this syntax would be applied outside of specific "standard" contexts of figures, equations, etc or manually specifying the block [Some Text]{#ref:} .

For example, where would the anchors be attached in these cases both for internal representation and for the example output HTML?

<span id="custom-checklist">
- [  ] Task 1
- [x] Task 2
- [?] Task 3
</span> {#ch:proposed-project-tasks}

This proposal will specify ... and a lot of other text. {#desc:proposed-project-tasks}

The simplest rule might be just attaching the anchor to the last "span" or block (text, figure, etc) excepting the predefined standard rules for example lists, figures, and others. Currently I have only briefly looked at the Pandoc source and am not sure how the AST is structured to give any technical opinions.

scaramouche1 commented 9 years ago

@elcritch: In the syntax proposed above, the anchor is associated to a rendered number. So, something close to your example could be entered as:

This is checklist {#cklist:xyz}:
 - Task 1
 - Task 2
 - Task 3

As listed in checklist @cklist:xyz ...

Which would be rendered as:

This is checklist 1:
 - Task 1
 - Task 2
 - Task 3

As listed in checklist _1_ ...

(The "_" is denoting what's hyperlinked.)

Extension: Unnumbered anchors and references

The previously proposed syntax did not account for cases in which anchors and references are not numbered. One way to extend the syntax to allow that use-case is by defining that anchors starting with "-" are unnumbered anchors (this notation is akin to the one used for year-only bibliographic citations). Thus, one could enter:

Checklist: {-#cklist:xyz}
 - Task 1
 - Task 2
 - Task 3

As listed in the [checklist]{@cklist:xyz} ...

Which would be rendered as:

Checklist:
 - Task 1
 - Task 2
 - Task 3

As listed in the _checklist_ ...

Translating to HTML and LaTeX

I believe the simplest HTML implementation is to translate unnumbered anchors as <a name="cklist:xyz"></a> and reference to them as <a href="#cklist:xyz">checklist</a>.

A LaTeX translation could use \hypertarget{cklist:xyz}{} for the anchor and \hyperlink{cklist:xyz}{checklist} for the reference.

Unnumbered references

The unnumbered notation should also apply to unnumbered references. For instance, [checklist]{-@cklist:xyz} would produce an unnumbered reference (i.e., a hyperlink) irrespective of whether the anchor was defined as {#cklist:xyz} or as {-#cklist:xyz}.

Dealing with an erroneous reference to an unnumbered anchor

Referencing an unnumbered anchor (e.g., [checklist]{-@cklist:xyz}) without a reference preceded by [text] should be rendered as an error (???). For instance,

Checklist: {-#cklist:xyz}
 - Task 1
 - Task 2
 - Task 3

As listed in the @cklist:xyz ...

Would be rendered as:

Checklist:
 - Task 1
 - Task 2
 - Task 3

As listed in the ??? ...
mb21 commented 9 years ago

Please note that anchors shouldn’t be seen as stand-alone items. The reason why anchors (or in pandoc/html parlance: ids) need to be attached to an element is that they are part of the Attr (attribute block) in pandoc’s internal data model, see e.g. the pandoc type for a header. This is analogous to HTML where the anchor/id also needs to be attached to an alement, e.g. <h1 id="myAnchor">…</h1>.


I agree that @fig:force should generally only result in the number (e.g. “1”), not the string “Figure 1”. So how to avoid that only the number would be part of the hyperlink then? I see three (not mutually exclusive) ways:

  1. A manual syntax, similar to the current bibliography-references: As seen in [Figure @fig:force].
  2. We could also make the preceeding word part of the hyperlink (if the word is in the same paragraph and there is nothing but exactly one space separating the word and the number). This should make the most common use case easy to type: As seen in Figure @fig:force -> As seen in <a href="#fig:force">Figure 1</a>.
  3. Manually make a link: As seen in [Figure](#fig:force). This has the disadvantage that people sometime would have to use @ to make a reference and sometimes a link with # which would be very confusing. Note that I don’t like [Equation]{@eq:force} because that would clash with the proposed span-syntax and doesn’t look like a link or reference.

I tend towards (1) in combination with (2).

mb21 commented 9 years ago

@jgm regarding the numbered example lists, I find the @ instead of the usual # at the anchor place (instead of the reference place) rather confusing. Maybe one of the following two?

(#ex:good) This is a good example.
(#ex:bad) This is a bad example.

{#ex:good} This is a good example.
{#ex:bad} This is a bad example.
scaramouche1 commented 9 years ago

@mb21: I believe that proposal 1 is better (manual syntax similar to bibliography-references). As seen in Figure {@fig:a} would just link the number; and as seen in [Figure {@fig:a}] would link "Figure 1".

Proposal 2 is interesting, but I am not sure it would render the right thing is all cases. Would it lead to consistent and useful behavior in cases such as as seen in H{@hyp:a}, as seen in examples {@ex:a}--{@ex:z} and as seen in examples {@ex:a} through {@ex:z}?

Proposal 3 is similar to something I had proposed earlier. But now I think this behavior it is too complex. Proposal 1 dominates Proposal 3.

aaren commented 9 years ago

@scaramouche1 - excellent contribution, thank you :).

I am leaning more towards using @ as the prefix now (rather than #).

@mb21 I think option 1 is good (see [Figure @fig:a])

Option 2 is tempting but what about plurals? (see figures @fig:a and @fig:b)

You might have to have typing for reference objects and some grammar of referencing to get this to work fully. I can see it getting complicated unless you just went with an allowed list of words ('figure', 'section', 'table' etc.) and only linked with these. This would be nice, but maybe it is too magical? Could be a configuration option on a filter.

Option 3 I don't like.

mb21 commented 9 years ago

Okay, so here comes my attempt at a summary of where I feel this is headed. Sorry if this thread is turning into a list of summaries.

aaren commented 9 years ago

An id attribute that contains a colon is a special anchor.

@mb21 I'm not sure about

  1. forcing namespacing using the anchor
  2. only being able to refer to objects with a special (colon containing) anchor

whilst the type:tag convention is quite common I'm not sure that we should force it on people.

I suggest that any object that can have an anchor can be referred to. If you want to link to it with a regular hyperlink, use [the object](#anchor); if you want internal-referencing, use @anchor.

The Markdown reader would scan the text for all @ references and for all anchor definitions and then associate them together, using the object that the anchor is defined on to determine the numbering scheme.

scaramouche1 commented 9 years ago

@aaren I have a question: if the type: part is not required, how would Pandoc guess that these three items do not share the same counter (but use three different counters)?

- Hypothesis #hyp ...
- Proposition #prop ...
- (#ex) ...

@mb21 Many thanks for the summary. One addition to it: It is important that references can optionally be put inside curly brackets, as this allows referencing things like

- As predicted by H{@hyp:temp}a...
- As shown in figures {@fig:a}--{@fig:z}...

and having them rendered as:

- As predicted by H1a...
- As shown in figures 1-5...

I believe it is very important that Pandoc translates references to "proper" LaTeX. So, I don't think it is a good idea for Pandoc to include hard coded versions of the reference numbers in the LaTeX code. In HTML or docx doing so is OK, but not in LaTeX. I have two main reasons for this:

1) LaTeX has facilities for creating hyperlinks, table of contents, tables of figures, etc... All these would be rendered useless if Pandoc hard codes the cross-references.

2) One of the main use cases of Pandoc in academic writing is to write a first draft in markdown, exporting it to LaTeX, and adding final touches in LaTeX. If the numbers are hard coded, editing in LaTeX will be very fragile and limited. This would also make unfeasible to send a Pandoc-created LaTeX file to a journal or book publisher (which uses LaTeX's cross references to create, e.g., the front matter of the book).

aaren commented 9 years ago

@scaramouche1 yes good point. I'd say enclose in a span or div and use .hypothesis as a class. How would you do this in latex? Is there a hypothesis environment? Is this rendered by mathjax?

Regarding curly brackets.... I'm convinced by the use case, but not by the syntax. I think square brackets would be more consistent, but maybe there is a way with no brackets.

Regarding translation to latex.... absolutely, I think the labels should be passed through as-is so that latex can do its own thing.

timtylin commented 9 years ago

Now that I've been using Scholdoc for close to a year, I can comment on some of the issues here from experience:

I am leaning more towards using @ as the prefix now (rather than #).

I'm still not entirely convinced that numbered references should clobber the same @ syntax as citations. It is possible run into some edge-cases where it's impossible to tell if something should be a reference or a citation. Both of these types of identifiers (ref keys and cite keys) have the same set of allowed characters in TeX. Using the : rule isn't going to unambiguously solve the issue; I'm certainly no the only one who ends up with a bunch of : in my reference database cite keys, even through no action of my own (mostly through importing colleague's entries). In LaTeX we relied on separating these two with \ref and \cite to avoid issues with namespace pollution.

I really wouldn't have brought this up if it wasn't already possible to use # instead for references in text in an unambiguous fashion. I use this for Scholdoc and it's been working out pretty well for the past year or so. I did remember considering @ for Scholdoc but I abandoned it for reasons that I forgot, although I suspect it's similar to the above.

Regarding curly brackets.... I'm convinced by the use case, but not by the syntax. I think square brackets would be more consistent, but maybe there is a way with no brackets.

I agree that square brackets are "more markdownish". In my experience there are not really any ambiguities caused by this (unless @ is used for the syntax, in which case it again clobbers citation). My stance form the last time we talk about this haven't changed.

LaTeX has facilities for creating hyperlinks, table of contents, tables of figures, etc... All these would be rendered useless if Pandoc hard codes the cross-references.

The most sustainable way is probably to use a new inline product type (or the much-hyped Link with Attr attached, if that is eventually real) that holds both the reference id and a candidate numbering (possibly in the Attr), and let the writer decide what to do. In Scholdoc I had the markdown reader generate the candidate numbering, mainly out of laziness on my part, but arguably this should be done in a filter so it can have the potential to do some IO (i.e., reaching into another document and grab references there).

jgm commented 9 years ago

+++ Tim T.Y. Lin [Feb 08 15 16:12 ]:

Now that I've been using Scholdoc for close to a year, I can comment on some of the issues here from experience:

I am leaning more towards using @ as the prefix now (rather than #).

I'm still not entirely convinced that numbered references should clobber the same @ syntax as citations. It is possible run into some edge-cases where it's impossible to tell if something should be a reference or a citation. Both of these types of identifiers (ref keys and cite keys) have the same set of allowed characters in TeX. Using the : rule isn't going to unambiguously solve the issue; I'm certainly no the only one who ends up with a bunch of : in my reference database cite keys, even through no action of my own (mostly through importing colleague's entries). In LaTeX we relied on separating these two with \ref and \cite to avoid issues with namespace pollution.

Yes. I have long regretted the fact that @ is used for two different things in pandoc: example list labels and citations. (Not to mention the clash with the increasingly popular twitterish use for usernames.)

However, if we switched to #, we'd break backwards compatibility for example lists. That's a pretty weighty consideration. One option would be to have an extension that enables the legacy behavior.

timtylin commented 9 years ago

However, if we switched to #, we'd break backwards compatibility for example lists. That's a pretty weighty consideration. One option would be to have an extension that enables the legacy behavior.

@jgm I think it's possible to keep the old syntax for citation lists though. Currently example lists (using symmetric @ syntax) and x-references work side-by-side in Scholdoc, since they use completely separate mechanisms.

We will have an additional ambiguity, if we use @ for in-text reference, with the definition of example lists. Of course this is already the case with current syntax. This isn't much of an issue now since definition of EL labels happens at the block level which takes precedence over referencing at the inline level, but if @ usage becomes more common (and using @ for inline anchor definition is somehow permitted) then obviously the rate of edge cases will increase.

Of course, if we switch to uniformly using # for defining anchors anyways, then this would affect example lists regardless of the choice of reference syntax.

(Not to mention the clash with the increasingly popular twitterish use for usernames.)

Example lists aside, I actually really like how it matches the notion of "referring to someone" for citations. My dream is to somehow be able to resolve DOIs/ISSN/PMID/ArXivID as cite keys (it's on my todo list for Scholdoc-citeproc). How cool would it be to do, e.g., @10.1190/1.234567 if it were somehow possible to unambiguously resolve the information.

I believe this also influenced my choice to use # for cross-references as well… it's like referring to a concept or a context, similar to how hashtags are currently used.

aaren commented 9 years ago

@timtylin: yes, I think this is best done in a filter, similar to pandoc-citeproc now.

I'm not completely sold on @ yet, my position is more undecided (vs #). Importing other peoples keys is a reason for having both, but you risk your cite keys getting clobbered in this case anyway. Latex does have distinct \cite and \ref, but was this for a compelling reason or is it historical? (I'm not sure)

@timtylin: going a bit off topic: I'm not sure that explicitly typing a doi is the most user friendly thing to do when referencing something, but yes it would be great to be able to refer to another doi's figures like that! Regardless, did you know that you could do this:

curl -LH "Accept: text/bibliography; style=bibtex" "http://dx.doi.org/10.1017/S0022112061000019"
mb21 commented 9 years ago

Okay, I see the need for having native LaTeX references. This leaves us with three possibilities:

  1. Add a native reference type to the pandoc data type (cleanest appraoach but very involved: all writers need to be adjusted accordingly and call a shared numbering module).
  2. Implementing the counter and number placement in a filter instead of the Markdown Reader (document conversion might take up to twice as long).
  3. Do all the stuff in the Markdown Reader as discussed, then have the LaTeX Writer extract that again to write native LaTeX references (this approach is somewhat of a hack and potentially error prone).

Personally, I’m tending toward (2) as a reasonable trade-off between maintainable code and implementation effort. (Except if someone has time to do (1)). Even if conversions will be slower, since in my experience LaTeX is always dominating conversion times over pandoc itself anyhow.


Once again, I don’t like standalone curly brackets, because markdown has generally been following the HTML tradition of having attributes and anchors only on elements that span some text. I think the inline spans with auto-generating numbers will suffice. As posted above:

Inline spans:
  [Hypothesis]{#hyp:temp} is that...
  <p><span id="hyp:temp">Hypothesis 1</span> is that...</p>
bgamari commented 9 years ago

@mb21 I would be happy to give (1) a try after I defend in June. It seems like we have already taken on enough technical debt in the name of "it's hard to add new AST nodes".

tomduck commented 9 years ago

I wrote a filter to number figures and references: pandoc-fignos. The syntax follows the recommendations by @scaramouche1 on Jan. 18.

Demonstration: input demo.md and output pdf, tex, html, epub and md.

Details: The filter should work with any output format. For LaTeX the \label and \ref macros are used. For everything else the numbers are hard-coded. Caption formatting is retained. There is no linking. A filter option allows image attributes to be left in place for further processing.

scaramouche1 commented 9 years ago

Thank you @tomduck! This seems like an excellent addition to Pandoc. I also saw you are working on pandoc-eqnos. Superb.

tomduck commented 9 years ago

Appreciated, @scaramouche1. And thanks for your and others efforts in putting together a well thought out spec.

I have been using pandoc-fignos for figure numbering and pandoc-eqnos for equation numbering in my academic writing over the past week or so. Both have been working well. People are welcome to file issues against them if problems are uncovered. Cheers.

lierdakil commented 9 years ago

Sorry for shameless self-promotion, but for anyone interested[^1], there's also a Haskell implementation of similar idea[^2], called pandoc-crossref. A couple additional features I personally find handy, like references to tables and list of figures/tables generation are included. Also some output configurability, like delimiters, etc, through metadata.

[^1]: e.g., anyone bad with Python, like me [^2]: although, syntax is slightly different to allow for automatic sequence collapsing, e.g. reference to 1,2,3 will collapse into 1-3, like LaTeX cleveref package (which is an option for latex output, by the way)

mangecoeur commented 9 years ago

@lierdakil - works great thanks! Though took me a few goes to realise i had to (re)install pandoc via Cabal (only used Python filters so far) ;) Seems to me this would make a good basis for including as a standard pandoc feature...

abudden commented 9 years ago

@lierdakil This is excellent, thank you. It would be great to see this integrated into pandoc as standard: how hard would it be to merge it in?

lierdakil commented 9 years ago

@abudden at the moment, this is not possible. For this to be included in pandoc, we'd need to revise document model to add attributes to all block elements. While that's possible, and there have been some work on it (see new-image-attributes branch), it's still too early to include in Pandoc, and it would break backwards compatibility in a major way. At the moment, due to document model limitations, pandoc-crossref relies on a hacky post-parsing solution and has a very limited syntax, so It's not something I'm confident enough to include into pandoc.

If/when block attributes are supported, we could talk about merging pandoc-crossref into mainline pandoc, but since pandoc-citeproc is a filter, I see no obvious reason to include pandoc-crossref into mainline pandoc. I plan publishing pandoc-crossref on hackage soon, so installation should be somewhat simplified in the future.

Including some degree of support (e.g. similar to --bibliography implying pandoc-citeproc) would be fine though, I suppose. But that's a little far off, not until after 1.14 release at least.

dluciv commented 8 years ago

More over, it will be useful to link to anything. LaTeX allows doing so: http://tex.stackexchange.com/a/4024/70953

crsh commented 8 years ago

For what it's worth, I just wanted to say that I would greatly appreciate the addition of a syntax like the one proposed by @scaramouche1.

hadim commented 8 years ago

That feature would be really cool.

For now I use \label{} and \ref{} from latex and it works when I convert to pdf but it don't when it comes to generate Word file or HTML :-(

mb21 commented 8 years ago

What do you think of adding a Figure element to the Pandoc AST instead of adding Attr to Table? There was some discussion in that direction in #673, plus a good discussion on pandoc-discuss which I just revived.

Something along the lines of Figure Attr [Block] [Block]—a figure with a caption (which can contain markdown) and containing block elements (like one or more tables, images, blockquotes, codeblocks, etc).

Would it be good enough if you can just reference Figures, so we wouldn't have to add Attr to Table?

ghost commented 8 years ago

That would cover all the uses I can think of off the top of my tired head.

jgm commented 8 years ago

Yes, this is sensible. Figure could then be used for images as figures, instead of the current hack of treating a Para with just an image as a figure.

+++ Mauro Bieg [Dec 02 15 14:09 ]:

What do you think of adding a Figure element to the Pandoc AST instead of adding Attr to Table? There was some discussion in that direction in [1]#673, plus a good [2]discussion on pandoc-discuss which I just revived.

Something along the lines of Figure Attr [Block] [Block]—a figure with a caption (which can contain markdown) and containing block elements (like one or more tables, images, blockquotes, codeblocks, etc).

Would it be good enough if you can just reference Figures, so we wouldn't have to add Attr to Table?

— Reply to this email directly or [3]view it on GitHub.

References

  1. https://github.com/jgm/pandoc/issues/673
  2. https://groups.google.com/forum/#!topic/pandoc-discuss/zlSp_u3oEO0
  3. https://github.com/jgm/pandoc/issues/813#issuecomment-161449373
mb21 commented 8 years ago

@lierdakil do you think we could make this work with:

Or do we really need attributes on more block elements (table, blockquote, etc.) and/or a dedicated reference element?

timtylin commented 8 years ago

@mb21 @lierdakil The former was what Scholdoc did and I personally think this would be the way to go. It's a better way to go if we want to add future attributes that only matter in a figure context (such as placement info, pre-rendered fallback image, etc), which you can already see some of in Scholdoc's current block type

mb21 commented 8 years ago

@timtylin thanks, good to hear you're still in favour of the Figure element. I took a closer look at Scholdoc's Figure, anything we should learn from this for pandoc? The PreparedContent is for "pre-rendered fallback image, etc", right? I guess that's kind of out of scope for pandoc... And what about the FigureType? I was thinking of handling that as part of the attribute, for output formats that need to know this: like {#fig:my-figure} (or even {type=figure}). So we don't have to change the AST whenever there's a new figure type. So we have just Figure Attr [Block] [Block] (in case there are formats where captions can be blocks as well). What do you think?

lierdakil commented 8 years ago

We already have a block container with attributes (it's called div). Better syntax is basically all we lack with it. But I still think that attributes on all or at least most blocks makes more sense, at least when thinking in terms of xml- and html-based formats.

As for reference elements, it really doesn't matter that much. I see no immediate need for attrs on reference elements (although those might come in handy), and we're basically free to choose whatever, if syntax would make sense. 4 дек. 2015 г. 15:04 пользователь "Tim T.Y. Lin" notifications@github.com написал:

@mb21 https://github.com/mb21 @lierdakil https://github.com/lierdakil The former was what Scholdoc did and I personally think this would be the way to go. It's a better way to go if we want to add future attributes that only matter in a figure context (such as placement info, pre-rendered fallback image, etc), which you can already see some of in Scholdoc's current block type https://github.com/timtylin/scholdoc-types/blob/master/Text/Pandoc/Definition.hs#L217

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/813#issuecomment-161951642.

aaren commented 8 years ago

Having Figure Attr [Block] [Block] does feel a bit redundant when we already have Div Attr [Block]. Why not just treat the first Para (or multiple) as the caption? I suppose the Figure caption can have completely arbitrary content (Figures all the way down!), rather than just Para.

I know there isn't agreed Div syntax yet, but I would also favour not using English words to specify the container.

If the Figure type can also contain tables then it looks more like a new Referable type than a dedicated figure container (could contain e.g. code blocks, block quotes as well). If this is the case then I'm not yet sold on the advantage cf. Div - is it worth it just to have the distinct caption field? Are there other things we haven't considered?

Another thing is that Figure logic (e.g. fancy placements) might be best handled by a filter.

I'm not sure what the best solution is here. I can see merit in both ways.

ghost commented 8 years ago

@lierdakil Just weighing in quickly on the matter of references. Reference attributes would make it nice and easy to reference to, e.g. reference the section a table is in rather than than the table itself, or the controversial pageref. Best to make the deep changes now so that it is just the matter of making changes in the readers/writers later on.