sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 218 forks source link

Strong and emphasis tags in round brackets #220

Closed md384 closed 8 years ago

md384 commented 8 years ago

The strong and emphasis tags are not applied with they are used inside round brackets and there is no space between the bracket and markdown tag.

For example (_test_) is converted to <p>(_test_)</p> and similarly for **/strong tag.

vsch commented 8 years ago

@md384 this is as per Markdown "spec". GitHub does it as part of its GFM extensions.

md384 commented 8 years ago

Does one of the pegdown extensions support this behavior?

vsch commented 8 years ago

Not at this time. I don't relish the idea of going into that part of the parser code. Markdown parsing of inlines is messy, more so when it comes to bold/italic or strong/emphasis.

markwoon commented 8 years ago

this is as per Markdown "spec". GitHub does it as part of its GFM extensions.

@vsch I don't think this is GFM specific.

There's nothing that says this is illegal in Gruber's "spec" at http://daringfireball.net/projects/markdown/syntax

Furthermore, CommonMark supports it just fine.

To me this looks like a bug in the parser rather than something that's out-of-spec (which I'd love to see fixed, because right now I've got to post-process to deal with this).

vsch commented 8 years ago

@markwoon, you are right. Silly me. I assumed that pegdown was keeping true to the original spec on this one without checking the original spec in detail. It is only if the emphasis mark is surrounded by spaces that it is treated as regular characters.

markwoon commented 8 years ago

@vsch Is that enough to persuade you to fix this? :)

vsch commented 8 years ago

@markwoon, I'll take a look at it so see if a quick fix will do the job. If it produces a lot of side-effects I will abandon it.

I have decided to migrate my plugin to a common-mark based parser to eliminate pegdown weaknesses like exponential parsing time when parsing consecutive [[[[[[ and other maintenance hurdles. All this to say that if it is an easy fix then I will do it, otherwise it is not worth the effort.

Additionally, I have made quite a few fixes and changes to my fork of pegdown and some of these may create backward compatibility issues. The regression tests pass but that does not mean much if critical tests are missing. I added the documentation to the top of the CHANGELOG

I also did not write new tests for all the changes that I added since these are tested in my plugin and I doubt they will be used by anyone else. If anyone wants to grab the source from my fork develop branch https://github.com/vsch/pegdown/tree/develop and make it into a clean PR for pegdown then it will have a chance of being incorporated into the next pegdown release. Otherwise, the changes will stay in my fork until I have slack in my schedule.

markwoon commented 8 years ago

@vsch If this issue gets fixed I'm perfectly happy switching to your fork for now.

Which common-mark parser are you using?

vsch commented 8 years ago

@markwoon, I am not using one yet but it is between Kotlin based one used by Markdown Support, I think it is https://github.com/valich/intellij-markdown or if not Kotlin then most likely https://github.com/atlassian/commonmark-java

Regular expression or peg style parsing of markdown is less than ideal. It really has to be parsed using first block identification then each block parsed for inlines.

vsch commented 8 years ago

@markwoon. @md384, if you are interested the changes are in https://github.com/vsch/pegdown/tree/develop, CHANGELOG and README updated for RELAXED_STRONG_EMPHASIS_RULES

Pretty close to GFM but cannot bold/italic a span of underscores in the middle of a word: abc**_**def is treated as all text. GFM treats it as an underscore between abc and def: abc_def

On the other hand abc**xyz**def is recognized as bold xyz between abc and def: abcxyzdef.

markwoon commented 8 years ago

@vsch Thanks for this.

FYI, we're switching to commonmark-java. We've found it to be much easier to extend.

vsch commented 8 years ago

@markwoon, I too plan on switching for the same reason. It is difficult because a lot of the plugin is coded around pegdown idiosyncrasies.