Open jgm opened 13 years ago
Yes, I'd like to give this one a +1, as I'm looking at latex output and wishing it'd at least be output as \textsc{ma}
. This might be a little overkill:
http://staff.science.uva.nl/~polko/HOWTO/LATEX/acronym.html
HTML is of course straight-forward, as it already supports acronyms.
John - as I'm reminded about this again, just checking: do you support this enhancement request in principle?
I'm wary of this, as it would significantly slow down parsing. (Each string parsed would have to be looked up in the acronym table.)
OK. At least it's possible to post-process this.
Does it make any difference that the only strings that need to be look up are those that look like an ACRONYM? Maybe a little, but not enough?
On Thu, Jul 7, 2011 at 12:34 PM, jgm reply@reply.github.com wrote:
I'm wary of this, as it would significantly slow down parsing. (Each string parsed would have to be looked up in the acronym table.)
Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/167#issuecomment-1524044
+++ bdarcus [Jul 07 11 09:47 ]:
OK. At least it's possible to post-process this.
Does it make any difference that the only strings that need to be look up are those that look like an ACRONYM? Maybe a little, but not enough?
I don't know; the performance effect would have to be measured. Perhaps if you test for all-caps, it would be minimal. But the syntax doesn't seem to be limited to all-caps acronyms.
John
Would it not be possible to simply pass turning on acronym parsing as a command-line flag? I think people who would benefit from it would not be all that upset in the slowdown involved, and those that don't need it would never know!
+1 also as in kramdown (don't know if there is special syntax there that might help the performance issue, but personally I like cflewis' idea of letting it be transparent via a CL switch
also found this https://code.google.com/p/pandoc/issues/detail?id=196 and http://www.blaenkdenum.com/posts/the-switch-to-hakyll/
This would really help. If it really slows down parsing, maybe an add-on program can be made which parses for abbreviations. When the operator does not enable the option the parsing is fast, when it is enabled, the parsing is slower but abbreviations get processed. There are three common things which are present in larger documents (which I use), abbreviations, glossaries and index. These would really help in shifting to markdown for larger documents which require more professional look and feel. Right now this is limiting my markdown usage to smaller documents.
Maybe you're right that the slowdown isn't an issue, as long as it's an extension.
+++ Shubho Roy [Nov 09 15 22:24 ]:
This would really help. If it really slows down parsing, maybe an add-on program can be made which parses for abbreviations. When the operator does not enable the option the parsing is fast, when it is enabled, the parsing is slower but abbreviations get processed. There are three common things which are present in larger documents (which I use), abbreviations, glossaries and index. These would really help in shifting to markdown for larger documents which require more professional look and feel. Right now this is limiting my markdown usage to smaller documents.
— Reply to this email directly or [1]view it on GitHub.
References
There's already an abbreviations
extension, however, it's not really useful:
Note that the pandoc document model does not support abbreviations, so if this extension is enabled, abbreviation keys are simply skipped (as opposed to being parsed as paragraphs).
(from pandoc manual)
It would be really great to have this extension fully functional.
In my own progress trying to do this as a filter, I noticed that it would be especially helpful for acronyms to also have flexible recognition in the case of plurals, possessives and parentheticals: e.g. W3C's staff; several RFCs; the Federal Trade Commission (FTC). Just looking for Str
with a direct match won't capture everything.
FWIW my 2 cents as I stumble on this 7 years old thread, and am looking for abbreviations support.
As an alternative to the *[...]
syntax -- which I don't find syntactically consistent, I would suggest to use links with the abbr
protocol (and with an ALT string):
[HTML]: abbr: "Hypertext Markup Language"
No special parsing is required as this is already delivered by the AST. Further full support for attributes is available, allowing, for instance, to handle plurals:
[RFC]: abbr: "Request for comments" {plural=RFCs}
A limitation of this approach is that the ALT string, as currently parsed, cannot contain markup.
Note: the semi-colon (:
) is here only to allow RFC-compliant URL parsing. From a Pandoc perspective, we don't need it.
AST sample output:
{
"blocks": [
{
"t": "Para",
"c": [
{
"t": "Link",
"c": [
[
"",
[],
[]
],
[
{
"t": "Str",
"c": "HTML"
}
],
[
"abbr:",
"Hypertext Markup Language"
]
]
},
{
"t": "Space"
},
{
"t": "Str",
"c": "and"
},
{
"t": "Space"
},
{
"t": "Link",
"c": [
[
"",
[],
[
[
"plural",
"RFCs"
]
]
],
[
{
"t": "Str",
"c": "RFC"
}
],
[
"abbr:",
"Request for Comments"
]
]
}
]
}
],
"pandoc-api-version": [
1,
17,
4,
2
],
"meta": {}
}
No special parsing is required as this is already delivered by the AST.
And this means that you could implement this feature fairly easily with a lua filter.