jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.69k stars 3.33k forks source link

Markdown footnotes are duplicated #1603

Open lkiesow opened 9 years ago

lkiesow commented 9 years ago

If you add multiple references to one footnote in Markdown like this:

This is a sentence [^fn] with footnote.

This is another sentence [^fn] with footnote.

[^fn]: This is the footnote.

pandoc will create multiple footnotes instead of referencing only one. For example, if converted into Markdown again, one would end up with:

This is a sentence[^1] with footnote.

This is another sentence [^2] with footnote.

[^1]: This is the footnote.

[^2]: This is the footnote.

The same happens when converted to HTML (I didn't try other formats). Obviously, this should not happen but instead it should be one footnote which is just linked to multiple points in the text.

I've tested this locally using version 1.12.3.3 as well as with 1.13.1 at Try pandoc!

van-de-bugger commented 9 years ago

I confirm the bug.

Just to compare: Another markdown processor, multimarkdown, when processing from md to html, generates many references to one footnote.

mpickering commented 9 years ago

It would probably be better to present a uniform solution which works across all output formats.

lkiesow commented 9 years ago

I guess to have a uniform solution is never a bad idea, but I'm woundering which format supports footnotes, but not multiple references to one footnote?

mpickering commented 9 years ago

I have no idea to be honest. That might be a good argument to just apply this change to the markdown writer.

van-de-bugger commented 8 years ago

Why the issue is marked as "enhancement"? I would say it is a bug.

jgm commented 8 years ago

+++ Václav Haisman [Jan 04 16 07:16 ]:

What about using the solution described in [1]http://tex.stackexchange.com/a/262956/28495?

If we did make a change allowing multiple references to a single footnote, I'd probably implement it in LaTeX without relying on an external package like fixfoot, to keep down dependencies.

But one thing to keep in mind is that pandoc supports many output formats, not just LaTeX. Any changes in how footnotes are handled require changes in all of them.

van-de-bugger commented 8 years ago

It seems this issue has a long history and discussion thread I am not aware of. Moreover, I am not experienced in this area, but…

mpickering wrote:

It would probably be better to present a uniform solution which works across all output formats.

I would not agree with this. Output formats differ, and uniform solution may not (or even will not) work for all.

For example: HTML output format assumes one (probably long) page. All the footnotes are printed at the very bottom. And having a separate footnote for every link looks ugly. (Backlinks is not a problem, at least conceptually, look at Mediawiki/Wikipedia solution: they have multiple backlinks in a single footnote.)

Another output format (TeX?) may assume output document consist of multiple pages, and footnotes are printed at the bottom of page. In such a case, it is natural to print footnote on each page containing a reference.

So, internal representation (I am not aware how you call it, probably AST?) should be able to represent multiple links to the same footnote. Each writer (html, tex, etc) should be able to handle it: either generate single footnote or multiple footnotes, depending on output format; no uniform solution is needed.

claudiosousa commented 7 years ago

Any progress on this whatsoever?

jgm commented 7 years ago

No progress. It would be a fairly involved change, as noted above, affecting pandoc-types, all readers, and all writers. There are higher-priority things to work on at the moment.

nsheff commented 7 years ago

Just ran into this issue. Would love to see pandoc handle footnotes like this correctly at some point.

defanor commented 6 years ago

FWIW, the same happens with reStructuredText footnotes.

chapterjason commented 5 years ago

Is there any plan when that should be fixed?

masnick commented 5 years ago

Just ran into this as well. Hoping for a fix, even though it sounds non-trivial.

ivanhercaz commented 5 years ago

I am really interested in this issue. I would like to use the same footnote in different places without having it duplicated.

@tarleb, I read your advice in #5196 but in my case I am not interested in use the footnotes with references inside, I just want to use as notes and have repeated the same text twice times is very annoying... Is any plan to fix it @jgm?

Of course, thanks in advance for your and time! Pandoc is an awesome and very useful tool :heart:

mb21 commented 5 years ago

The problem is how Note is defined in the pandoc document AST. A note is simply an inline element, just like Strong or Strikethrough. Even though the note seems to have two distinct parts (the [^1] and the [^1]: text in the markdown, that's not how it's represented in the AST, there is only one part there (you can see this with -t native).

Thus the cleanest solution would be to change the AST, but that's something we try to do rarely, as it has the potential to break all pandoc filters etc. Btw. there is a proposal that would also change the Note element in the AST.

The alternative would be to somehow de-duplicate the Notes. But that seems somewhat hacky, as we only have the note-text to go by (the unique identifier is lost by then), but it's certainly something you could do in a pandoc filter.

link2xt commented 5 years ago

@mb21 I am strongly in favour of changing AST and separating notes from note references.

Currently reader has to match note references to notes, so it has to postpone generation of AST until all notes are collected. It is also the reason for ugly F/Future in Markdown, Muse and other readers, and two-pass parsing in RST reader. Separating the notes from note references

Separating the notes from references will make it possible to shift the function of note reference lookup to writers, where all the notes are already collected.

jgm commented 5 years ago

It is also the reason for ugly F/Future in Markdown, Muse and other readers, and two-pass parsing in RST reader.

Not the only reason. There are also e.g. reference links. Changing this wouldn't remove the need for Future or two-pass.

Note also that the proposed AST change would require two passes (or something equivalent) in many of the writers. So I don't think there is any net decrease in complexity.

In addition to breaking filters etc., the change would be quite a lot of work, since every reader and writer would have to be modified. That's not to say we shouldn't do it. But I would prioritize better table and figure support over this.

jgm commented 5 years ago

Note that separating note references and notes in the AST would also allow a fix for #2053...though there may be some complications handling that in some output formats.

De-duplicating the notes is not a good solution, since people might want to have multiple notes with the same content (e.g., "Ibid., p. 33").

p3732 commented 5 years ago

Might be an obvious quickfix, but for Latex a simple $^{<number>}$ can be used if the footnote number is known and will not change anymore.

hoijui commented 4 years ago

Reading this issue discussion, kind of made me think that there is a general potential for a structural improvement regarding the AST that would allow for smoother changes. More specifically: Would it be possible to make the required change in the AST, but also have "a function" that then converts the AST into how it would have looked before, and then let the readers define which version of the AST they are/can supply, and let the writers define which version(s) of the AST they support/prefer? Or instead of linear versions only, it could even be version+extensions. Such a scenario would allow to change the AST, and then gradually implement support for new features in the readers and writers. sort of a parallelization of development efforts, if you will.

mb21 commented 4 years ago

The problem is that the AST is not only exposed to the readers and writers, but also potentially to haskell programs using pandoc as a library (although a lot of them hopefully use Text.Pandoc.Builder), and worst of all, to pandoc filters. The latter could be fixed long-term to some degree by changing the JSON serialization format to be more human-readable, less Haskell-ADT-oriented, see this comment, although some breakage is probably unavoidable unless you go full-blown versioning and 100% backwards-compatibility which has a lot of mental and development overhead.

hoijui commented 4 years ago

what if everything was versioned (including the things you mentioned like filters), and we do backwards compatibility where feasible, and where not implemented, we fail gracefully with a meaningful message like: "Error: Filter abc.py (supporting versions [0.3 - 0.4]) does not support any of the AST versions available ([0.6 - 0.9]) for the current input and output formats."?

tarleb commented 4 years ago

The JSON output does include version information like "pandoc-api-version":[1,17,6]. The idea of providing compatibility layers is good, it could help to avoid a schism of the python 2/3 kind. We may have to take that step at some point, but updating everything will still be a huge amount of work for the reasons mentioned by @mb21.

hoijui commented 4 years ago

yeah, one can not reduce the work with versioning, but the work that has to be done at once, so to speak... right? If we assume that everything that is unversioned supports only the current version (not current at the time, but current when the versioning is introduced), it should all be gradual, no? but yeah... lots of work.

brainchild0 commented 4 years ago

Does this issue fall under the topic of cross references?

cforne commented 1 month ago

Any progress on this?