sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 217 forks source link

Add support for Wiki Images #243

Open vmassol opened 8 years ago

vmassol commented 8 years ago

Similar to the support for WIKILINKS, Pegdown should also support the same for images (i.e. WIKIIMAGES).

And related to #92, the support for WIKIIMAGES should support specifying some alt text too.

I'd propose a syntax like:

![[wiki reference to image|alt text]]

Thanks! FTR this is causing the following bug on the xwiki project: http://jira.xwiki.org/browse/MARKDOWN-12

vsch commented 7 years ago

@vmassol, from past experience it can be done but does require a lot of playing with the grammar. Everything is interrelated because it sits in one giant PEG.

I rewrote commonmark-java to replace pegdown in my Markdown Navigator plugin for IntelliJ IDEs: https://github.com/vsch/idea-multimarkdown. The parser project is https://github.com/vsch/flexmark-java has very detailed source based AST with source offset for every part of the element. I need that for syntax highlighting and other plugin source reliant features.

It is CommonMark 0.27 (GitHub Comments) compliant but has parser configuration options to emulate list indentation rules used by: markdown.pl, MultiMarkdown (like pegdown 4 space indents) and kramdown (GitHub Docs). The only extensions that pegdown has that I did not yet implement are: typographic quotes, smarts and definition lists. The rest of the extensions are available, with some extra ones that pegdown does not have.

As an added bonus and what motivated me to write my own parser: the parsing is 30-50x faster than pegdown on average documents and several thousand times faster on pegdown's pathological input like [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[.

The AST is very regular, part of all the tests not just ones geared for AST testing and source offsets are free of idiosyncrasies and easily adjusted. The AST is fully modifiable unlike pegdown's with next, prev and parent links. There are many ways to extend the parser. I geared it for extensibility and made sure handling of AST nodes is uniform whether it is part of the core or part of an extension. So extensions can and do add their own nodes to the AST.

Maintaining it is a walk in the park compared to what I had to do with pegdown. The tests are in a modified markdown format, actually CommonMark test spec format extended to include the AST. The AST generation is tested on every test not just some chosen few like it was on pegdown. I literally lost weeks debugging AST source offset errors in pegdown for cases that were not part of its AST tests.

It is Java 1.8 level because I wanted to use lambda syntax but it can be easily refactored in IntelliJ IDEA to lower level if the need is sufficient.

Take a look at it because I can add parsing of the wiki image extension quite easily. Right now it is a special case not to accept the ! in front of [[]]. I am also actively maintaining it for my plugin and adding to its capabilities. If you try it, let me know if you run into difficulties so I can help you resolve them.