Closed dhimmel closed 3 years ago
Not sure about your usecase, but potentially you could also just use a markdown link or span with custom attributes and use a filter to go from there...
btw. #813 might be related?
potentially you could also just use a markdown link or span with custom attributes and use a filter to go from there
The pandoc-url2cite
filter by @phiresky has a nice syntax for defining citekey aliases that can contain forbidden characters:
Citekey with closing slash @google
[@google]: https://www.google.com/
I think this is a good workaround, but I still see a benefit in being able to skip the link reference / alias step altogether and do something like:
Citekey with closing slash @{https://www.google.com/}
This seems like a reasonable change to me.
Coming back to this in response to the PR, I'm a bit torn.
In many ways I really like the syntax
Citekey with closing slash [@google]
[@google]: https://www.google.com/
It seems to me this would be nicer to write with than including long DOI links inline -- for one thing, it's not obvious what a DOI citation is unless you follow the link, so using mnemonic names would increase source readability, in line with Markdown's goals.
But a drawback is that things like
Citekey with closing slash [@google]
[@google]: https://www.google.com/
already have a well-defined meaning in Markdown (as regular links), which we'd be changing. We'd also be removing the equivalence between this and
Citekey with closing slash [@google](https://www.google.com/)
There's also the question whether to support a bare (author-in-text) @google
with this syntax.
I've wanted to open a separate issue about this before, but I guess now it's part of this discussion anyways:
In my opinion, pandoc should change the parsing of both [@foo](bar)
and [@foo]\n\n[@foo]: bar
to be the same as the parsing of [foo](bar)
and [foo]\n[foo]: bar
respectively. That is, just regarding parsing it should behave the same as pandoc -f markdown-citations
.
This might require an AST change - citations could be a special form of links, or it could be output the same as now just adding some "cite key alias" logic (the "link label" would be the short cite key and the "link target" would be the full cite key (e.g. doi / url)).
We also now have (at least) three competing implementations of exactly the above (converting the AST produced by [@x]\n\n[@x]: https://....
to have it mean citekey alias instead of literal text):
That's because we all wanted some form of using globally unique identifiers like urls, isbns and dois, as cite-keys, in combination with a 100% automated bibliography manager.
All of these three implementations are incompatible (as far as i know), since we handle newlines / paragraphs differently - so it would be great if this was just supported in pandoc core.
Changing the parsing would make writing "polyglot" markdown documents easier that are also parsed by a different parser like Github or commonmark that still use citations.
It would also mean that adding the new syntax proposed above is unnecessary, since that then just becomes [@](https://openreview.net/forum?id=HkwoSDPgg)
(or [@x]\n[@x]: https...
) with all the same escaping rules like URLs.
I also don't think it would affect existing documents much, since who puts round brackets right after citations or [@x]: y in its own paragraph/line?
I'm somewhat confused what exactly the meaning of [@x](y)
is supposed to be. How is it different from [@](y)
?
Furthermore somewhat related, the parsing of link definitions right now allows them only at the start of a paragraph. I find that rather weird, either they just have to be on their own line or they have to be their own paragraph. I suppose the latter was probably intended but wasn't done, in order to not have to backtrack that far?
It would also mean that adding the new syntax proposed above is unnecessary, since that then just becomes
[@](https://openreview.net/forum?id=HkwoSDPgg)
(or[@x]\n[@x]: https...
) with all the same escaping rules like URLs.
With the proposed brace syntax you could distinguish between a regular citation [@{foo}]
and an author-in-text citation @{foo}
. I'm not sure how this would work with your proposal. [@](foo)
would presumably only correspond to one of these.
Instead of overloading reference link syntax as proposed above, if the point is just to provide short, readable aliases for unreadable citation keys that might be used in a bibliography, we could make pandoc-citeproc's citation lookup sensitive to a table of aliases that could be provided in the metadata:
citation-aliases:
foo: @{big-long-citation-with-weird-symbols}
Thanks @Aver1y for the implementation in https://github.com/jgm/pandoc/pull/6373 to support @{citekey}
. This will provide sufficient flexibility to include to all the types of citation keys we're interested in.
we could make pandoc-citeproc's citation lookup sensitive to a table of aliases that could be provided in the metadata
For the pandoc-manubot-cite
filter, we support metadata.citekey-aliases
. Credit to @nichtich for initially suggesting this approach. We also support the reference link syntax.
I think @phiresky likes the reference link syntax because the hyperlinks on citekeys will render in basic markdown engines, like when viewing the .md
file on GitHub.
I haven't used other-ids
field for references (https://github.com/jgm/pandoc-citeproc/issues/356 / https://github.com/jgm/pandoc-citeproc/commit/8326a10f93c1109105ddce01502b2da22f8e6445), but this feature is also something to be aware of when thinking about citekey aliases.
With url2cite I'm really thinking of these alias definitions as definitions of bibliography entries and I like that I can define them locally to where I use them. Also the parallel to link aliases and footnotes is nice. I think we should also allow mixing definitions of footnotes, link aliases and citekey aliases inside one paragraph like this:
You can mix footnotes[^1], [link] alias and citekey alias [@Hirshfeld2016]
definitions in one paragraph.
[^1]: This is a footnote
[link]: https://example.com
[@Hirshfeld2016]: https://www.ncbi.nlm.nih.gov/pubmed/27672412
Also the parallel to link aliases and footnotes is nice.
Superficially it may seem that the idea completes a symmetry with existing handling of links and footnotes, but cite keys are different because they are keys to data outside the document body. Thus, the effect of the proposition would be to produce a key to a key, where the first key is resolved internally to the body text (which includes the link and footnote definitions), and the second is resolved by the separate citations machinery. Thus the idea by @jgm to consider this first key as an alias that might be given also in the metadata seems more appropriate than the attempt to force citations into the same category as links and footnotes.
but cite keys are different because they are keys to data outside the document body
I'm not sure why that is a relevant difference.
Thus, the effect of the proposition would be to produce a key to a key
This doesn't seem as clear cut to me either, the link syntax is already overloaded into meaning many things: In [a](b)
, a is a usually short visible name / identifier, and b is a often longer, less readable, sometimes globally unique identifier to specify some external resource, including
<base>
element./foo
for domain-relative and //foo.com
for protocol-relative.javascript:alert("foo")
or data:text/plain,hello
or steam://open/bigpicture
or whateverAll of these are resolved by a "separate machinery" and some but not all of them are "keys to data outside the document body". IMO the content of a URL is that website, not the char-sequence that comprises the link.
As a comparison, footnotes like [^a] are keys to content, kinda similar to [a](data:text/plain,hello)
.
Citekeys with the above syntax would be similar to links: a
is a short locally defined identifier, while b
is a often longer, less readable, sometimes globally unique identifier, like:
This long identifier can both be resolved by the user agent, or by a preprocessor that turns them into something more readable or useful.
Mainly in my opinion it makes sense to keep pandoc syntax close to commonmark, especially where there is no real reason to deviate - adding back [@foo]\n[@foo]: xyz
syntax is something you intuitively expect to work if you know markdown, and it makes it more compatible with other markdown parsers (where handling those specially could then be implemented as a post-processor).
I wouldn't really care that much about the [@foo](bar)
syntax, since while it makes sense for the use case of polyglot documents between commonmark and pandoc-url2cite, I can see why it doesn't have well-defined semantics in general.
but cite keys are different because they are keys to data outside the document body
I'm not sure why that is a relevant difference.
Yes, well, I was trying to be brief, but now I will offer the details.
In the case of footnotes and links, all of the information for the document is in the document, which includes both the main text and the definitions list for footnotes and links. Placing footnote content or link addresses in a physical position outside the main text is useful because it makes the appearance of the Markdown representation closer to a published target, more fully meeting natural expectations of how a document appears visually.
Although it may be useful to put short-form keys in the text, the use of the footnote and link definitions list for this purpose has the effect adding an intermediary location into the process for resolving the ultimate target of the citation. This design does not advance the original purpose of the list, as a place for items that are part of the text but visually separate from the main text.
Since cite keys ultimately are resolved by citations processors evaluating the metadata, resolving the short-form keys from an alias table also appearing in the metadata preserves the constraint that all the keys are found in the metadata, only adding to the metadata one further table. This approach separates concerns, simplifies design, and clarifies operation. It also appears, as far as I would understand so far, to have minimal or no adverse affect for the user in most cases.
While the distinction is subtle from some perspectives, the details within it are relevant, I would politely argue, for choosing among the design choices that have been offered.
All of these are resolved by a "separate machinery" and some but not all of them are "keys to data outside the document body". IMO the content of a URL is that website, not the char-sequence that comprises the link.
Yes, formally, but see above. The distinction is that while a human is free to open a website from a hyperlink while reading a document, the document includes only the address itself, not the content of the web site. The link address is a final destination from a standpoint of document processing. The cite key is not.
I think @phiresky likes the reference link syntax because the hyperlinks on citekeys will render in basic markdown engines, like when viewing the
.md
file on GitHub.
I can see how this effect is useful, but the method described, as I understand, would force a @
prefix into the visible text. Perhaps a cleaner method is to post-process (i.e. filter) links matching some criteria (e.g. pattern match, appearance in a table) into full references. (This behavior appears already to be supported in url2cite
.)
Also at the risk of adding unwanted clutter, I would also give the observation that mapping rules in metadata opens the possibility for much more sophisticated rules, if needed, for example, a pattern rule such that @{OR-HkwoSDPgg}
is shorthand for @{https://openreview.net/forum?id=HkwoSDPgg}
, without the appearance of the HkwoSDPgg
key in any static table. Conventions of this kind might be defined per document or in a general pipeline applied to multiple input documents.
citation-patterns:
- [ 'OR-(\w+)', 'https://openreview.net/forum?id=$1' ]
I'm not sure how strong the case is for this functionality, but based on the original request, it seems that keeping open possibilities such as this one is compelling.
Your example (and any other from practice I can think of) does not require a pattern but could be done via a namespace prefix rule, e.g. citation-prefix
.
Now that we have the @{...}
syntax, can we close this?
Now that we have the
@{...}
syntax, can we close this?
Fantastic news! Is there a commit or pull request that added this? Couldn't find anything in the recent history.
Closing since the @{...}
syntax provides the required flexibility.
Sorry, false alarm! I could have sworn that we'd added this feature, but I guess not; it was only discussed!
From the Pandoc manual:
The citation key syntax is limited (see as a regex), preventing use of a variety of types of citekeys that various users would like:
One user case that I'm interested in for Manubot is citation-by-persistent-identifier where the citekey is an actual identifier. Oftentimes however, identifiers contain characters forbidden from Pandoc's citation key syntax. For example, we'd like to be able to include citekeys like:
My intended use case would just require more flexible citekeys for markdown input. We would likely use a custom filter to generate new citekeys for the output. However, it also seems from the issues above that some users would like more flexibility for citekeys across the board.
In https://github.com/jgm/pandoc-citeproc/issues/308#issuecomment-451230632, @jgm proposed a syntax like: