textmate / GitHub-Markdown.tmbundle

:octocat: GitHub Flavoured Markdown for TextMate
MIT License
40 stars 12 forks source link

The Italic Conundrum #24

Closed infininight closed 7 years ago

infininight commented 7 years ago

The previous fix for this issue uses the regex:

    (_)(?=\w)(?<=\w)

This puts the look-behind assertion after the _ looking back, this shouldn't ever match but does due to some quirk of the engine which I don't really understand. I've moved this before the _ character to make it function as intended.

Edit: Brain fart moment there, of corse it worked because `is a word character, silly me. Regardless, the intent would be to match\wbefore the` so it should be moved.

There are other issues though: The first is due to how the italics rule in the main grammar is done. It uses a look-ahead to ensure that the ending will exist before starting, it can easily be tripped up with the string:

this _is_invalid

Because the italics rule is entered due it detecting a valid italics context but the injected grammar yanking the closing _ leaving the context persisting indefinitely.

Started looking into this due to some discussion in textmate/markdown.tmbundle#46 with the MathJax string:

$$y_i = \alpha_{j[i]} + \beta x_i + \epsilon_i$$

The proper way to handle this specific case I believe is a MathJax grammar, but it was a real world example of the issue.

To improve the issue I've again disabled the rule inside italics contexts, which allows a rule already entered to exit. The rule now only serves to prevent a italics context to be entered in a common invalid context for GitHub's markdown variant.

I think to truly solve this issue a rule would need to be written to wholly match 'word' like constructs that are not desired to be passed through the main italics rule.

An example of where the behavior in this PR fails is the first example again:

this _is_invalid

Where GitHub does not consider this to be italics as the closing _ is inside the rule but we can't exclude the case as we are already too late with a simplistic exception that hasn't matched a larger section of the text.

A second issue easily fixed is that it was matching the third _ in the string:

__double underscores__

Which prevents the bold context to exit, this has been corrected.

Also corrected is properly excluding embedded and raw cases as it should never match characters in those.

cc @noniq

MikeMcQuaid commented 7 years ago

Thanks for the PR @infininight. I'll let @noniq chip in here as he's dug into this far more than I have.

noniq commented 7 years ago

Looks perfect, thanks so much @infininight!

I wonder if it would make sense to add a small test document containing an example of each of the cases your PR is addressing? Just to make it easier to avoid regressions in the future!

infininight commented 7 years ago

Added a few I was using during testing.

noniq commented 7 years ago

Merged, thanks again!

MikeMcQuaid commented 7 years ago

Thanks both!

fonnesbeck commented 7 years ago

Underscores still appear to be getting interpreted as italics inside equations, but at least it does not break subsequent code:

Thanks!