Closed md384 closed 8 years ago
@md384 this is as per Markdown "spec". GitHub does it as part of its GFM extensions.
Does one of the pegdown extensions support this behavior?
Not at this time. I don't relish the idea of going into that part of the parser code. Markdown parsing of inlines is messy, more so when it comes to bold/italic or strong/emphasis.
this is as per Markdown "spec". GitHub does it as part of its GFM extensions.
@vsch I don't think this is GFM specific.
There's nothing that says this is illegal in Gruber's "spec" at http://daringfireball.net/projects/markdown/syntax
Furthermore, CommonMark supports it just fine.
To me this looks like a bug in the parser rather than something that's out-of-spec (which I'd love to see fixed, because right now I've got to post-process to deal with this).
@markwoon, you are right. Silly me. I assumed that pegdown was keeping true to the original spec on this one without checking the original spec in detail. It is only if the emphasis mark is surrounded by spaces that it is treated as regular characters.
@vsch Is that enough to persuade you to fix this? :)
@markwoon, I'll take a look at it so see if a quick fix will do the job. If it produces a lot of side-effects I will abandon it.
I have decided to migrate my plugin to a common-mark based parser to eliminate pegdown weaknesses like exponential parsing time when parsing consecutive [[[[[[ and other maintenance hurdles. All this to say that if it is an easy fix then I will do it, otherwise it is not worth the effort.
Additionally, I have made quite a few fixes and changes to my fork of pegdown and some of these may create backward compatibility issues. The regression tests pass but that does not mean much if critical tests are missing. I added the documentation to the top of the CHANGELOG
I also did not write new tests for all the changes that I added since these are tested in my plugin and I doubt they will be used by anyone else. If anyone wants to grab the source from my fork develop branch https://github.com/vsch/pegdown/tree/develop and make it into a clean PR for pegdown then it will have a chance of being incorporated into the next pegdown release. Otherwise, the changes will stay in my fork until I have slack in my schedule.
@vsch If this issue gets fixed I'm perfectly happy switching to your fork for now.
Which common-mark parser are you using?
@markwoon, I am not using one yet but it is between Kotlin based one used by Markdown Support, I think it is https://github.com/valich/intellij-markdown or if not Kotlin then most likely https://github.com/atlassian/commonmark-java
Regular expression or peg style parsing of markdown is less than ideal. It really has to be parsed using first block identification then each block parsed for inlines.
@markwoon. @md384, if you are interested the changes are in https://github.com/vsch/pegdown/tree/develop, CHANGELOG and README updated for RELAXED_STRONG_EMPHASIS_RULES
Pretty close to GFM but cannot bold/italic a span of underscores in the middle of a word: abc**_**def
is treated as all text. GFM treats it as an underscore between abc and def: abc_def
On the other hand abc**xyz**def
is recognized as bold xyz between abc and def: abcxyzdef.
@vsch Thanks for this.
FYI, we're switching to commonmark-java. We've found it to be much easier to extend.
@markwoon, I too plan on switching for the same reason. It is difficult because a lot of the plugin is coded around pegdown idiosyncrasies.
The strong and emphasis tags are not applied with they are used inside round brackets and there is no space between the bracket and markdown tag.
For example
(_test_)
is converted to<p>(_test_)</p>
and similarly for**/strong
tag.