michelf / php-markdown

Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.
http://michelf.ca/projects/php-markdown/
Other
3.42k stars 530 forks source link

One too many conversions of html special characters & and < when inside em text #353

Open Sameh-R-Labib opened 3 years ago

Sameh-R-Labib commented 3 years ago

your code has a bug where if an ampersand or a less than sign is found in code marked (in markdown) as em via single asterisks the resulting html will contain double the encoding of the html special characters into their corresponding html entities. I've attached three screenshots: 1) markdown 2) resulting html 3) php code which calls your function

markdown static The code

lorddoumer commented 2 years ago

I experience the same issue, when enabeling Markdown Extra at Grav CMS with inline code - any idea how to solve this?

michelf commented 2 years ago

This is related to the no_entities mode. I suppose using the hashing system to make the generated &amp; invisible to subsequent passes would fix the issue. For instance by adding two hashPart calls in the encodeAmpsAndAngles function:

    protected function encodeAmpsAndAngles($text) {
        if ($this->no_entities) {
            $text = str_replace('&', $this->hashPart('&amp;', ':'), $text);
        } else {
            // Ampersand-encoding based entirely on Nat Irons's Amputator
            // MT plugin: <http://bumppo.net/projects/amputator/>
            $text = preg_replace('/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/',
                                '&amp;', $text);
        }
        // Encode remaining <'s
        $text = str_replace('<', $this->hashPart('&lt;', ':'), $text);

        return $text;
    }

This is not a terribly efficient way of doing it (calling hashPart every time encodeAmpsAndAngles is called), but it should work.

It's a bit sad there's nothing in the test suite for the no_entities mode.