jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.38k stars 3.37k forks source link

Abbreviation definitions like PHP Markdown Extra #167

Open jgm opened 13 years ago

jgm commented 13 years ago
I'd like to see Pandoc markdown supporting the 
<abbr> syntax of PHP Markdown Extra:

> PHP Markdown Extra adds supports for abbreviations
> (HTML tag <abbr>). How it works is pretty simple:
> create an abbreviation definition like this:
> 
> *[HTML]: Hyper Text Markup Language
> *[W3C]:  World Wide Web Consortium

Of course I'd like support for class tags as per
above in such a definition too.  One may want to
differentiate abbreviations and acronyms, and
since HTML 5 is to drop the <acronym> tag an
acronym *class* would be useful!

Google Code Info:
Issue #: 196
Author: bpjonsson
Created On: 2010-01-07T13:57:02.000Z
Closed On: 
bdarcus commented 13 years ago

Yes, I'd like to give this one a +1, as I'm looking at latex output and wishing it'd at least be output as \textsc{ma}. This might be a little overkill:

http://staff.science.uva.nl/~polko/HOWTO/LATEX/acronym.html

HTML is of course straight-forward, as it already supports acronyms.

bdarcus commented 13 years ago

John - as I'm reminded about this again, just checking: do you support this enhancement request in principle?

jgm commented 13 years ago

I'm wary of this, as it would significantly slow down parsing. (Each string parsed would have to be looked up in the acronym table.)

bdarcus commented 13 years ago

OK. At least it's possible to post-process this.

Does it make any difference that the only strings that need to be look up are those that look like an ACRONYM? Maybe a little, but not enough?

On Thu, Jul 7, 2011 at 12:34 PM, jgm reply@reply.github.com wrote:

I'm wary of this, as it would significantly slow down parsing.  (Each string parsed would have to be looked up in the acronym table.)

Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/167#issuecomment-1524044

jgm commented 13 years ago

+++ bdarcus [Jul 07 11 09:47 ]:

OK. At least it's possible to post-process this.

Does it make any difference that the only strings that need to be look up are those that look like an ACRONYM? Maybe a little, but not enough?

I don't know; the performance effect would have to be measured. Perhaps if you test for all-caps, it would be minimal. But the syntax doesn't seem to be limited to all-caps acronyms.

John

cflewis commented 12 years ago

Would it not be possible to simply pass turning on acronym parsing as a command-line flag? I think people who would benefit from it would not be all that upset in the slowdown involved, and those that don't need it would never know!

hansbkk commented 10 years ago

+1 also as in kramdown (don't know if there is special syntax there that might help the performance issue, but personally I like cflewis' idea of letting it be transparent via a CL switch

also found this https://code.google.com/p/pandoc/issues/detail?id=196 and http://www.blaenkdenum.com/posts/the-switch-to-hakyll/

shubhoroy commented 8 years ago

This would really help. If it really slows down parsing, maybe an add-on program can be made which parses for abbreviations. When the operator does not enable the option the parsing is fast, when it is enabled, the parsing is slower but abbreviations get processed. There are three common things which are present in larger documents (which I use), abbreviations, glossaries and index. These would really help in shifting to markdown for larger documents which require more professional look and feel. Right now this is limiting my markdown usage to smaller documents.

jgm commented 8 years ago

Maybe you're right that the slowdown isn't an issue, as long as it's an extension.

+++ Shubho Roy [Nov 09 15 22:24 ]:

This would really help. If it really slows down parsing, maybe an add-on program can be made which parses for abbreviations. When the operator does not enable the option the parsing is fast, when it is enabled, the parsing is slower but abbreviations get processed. There are three common things which are present in larger documents (which I use), abbreviations, glossaries and index. These would really help in shifting to markdown for larger documents which require more professional look and feel. Right now this is limiting my markdown usage to smaller documents.

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/jgm/pandoc/issues/167#issuecomment-155314254
lukasz-r commented 6 years ago

There's already an abbreviations extension, however, it's not really useful:

Note that the pandoc document model does not support abbreviations, so if this extension is enabled, abbreviation keys are simply skipped (as opposed to being parsed as paragraphs).

(from pandoc manual)

It would be really great to have this extension fully functional.

npdoty commented 6 years ago

In my own progress trying to do this as a filter, I noticed that it would be especially helpful for acronyms to also have flexible recognition in the case of plurals, possessives and parentheticals: e.g. W3C's staff; several RFCs; the Federal Trade Commission (FTC). Just looking for Str with a direct match won't capture everything.

marcastel commented 5 years ago

FWIW my 2 cents as I stumble on this 7 years old thread, and am looking for abbreviations support.

As an alternative to the *[...] syntax -- which I don't find syntactically consistent, I would suggest to use links with the abbr protocol (and with an ALT string):

  [HTML]: abbr: "Hypertext Markup Language"

No special parsing is required as this is already delivered by the AST. Further full support for attributes is available, allowing, for instance, to handle plurals:

  [RFC]: abbr: "Request for comments" {plural=RFCs}

A limitation of this approach is that the ALT string, as currently parsed, cannot contain markup.

Note: the semi-colon (:) is here only to allow RFC-compliant URL parsing. From a Pandoc perspective, we don't need it.

AST sample output:

{
  "blocks": [
    {
      "t": "Para",
      "c": [
        {
          "t": "Link",
          "c": [
            [
              "",
              [],
              []
            ],
            [
              {
                "t": "Str",
                "c": "HTML"
              }
            ],
            [
              "abbr:",
              "Hypertext Markup Language"
            ]
          ]
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "and"
        },
        {
          "t": "Space"
        },
        {
          "t": "Link",
          "c": [
            [
              "",
              [],
              [
                [
                  "plural",
                  "RFCs"
                ]
              ]
            ],
            [
              {
                "t": "Str",
                "c": "RFC"
              }
            ],
            [
              "abbr:",
              "Request for Comments"
            ]
          ]
        }
      ]
    }
  ],
  "pandoc-api-version": [
    1,
    17,
    4,
    2
  ],
  "meta": {}
}
jgm commented 5 years ago

No special parsing is required as this is already delivered by the AST.

And this means that you could implement this feature fairly easily with a lua filter.