michelf / php-markdown

Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.
http://michelf.ca/projects/php-markdown/
Other
3.42k stars 529 forks source link

Should $clean_tags_re contain img? #402

Open edent opened 2 weeks ago

edent commented 2 weeks ago

If I have this HTML:

<img src="example.png" alt="Alt text

With

Multiple newlines." >

It is transformed into:

<p>&lt;img src="example.png" alt="Alt text</p>

<p>With</p>

<p>Multiple newlines." ></p>

Changing this line:

https://github.com/michelf/php-markdown/blob/51613168d71787b0fe8472166ccbfa8d285c02cd/Michelf/MarkdownExtra.php#L342

to

protected string $clean_tags_re = 'script|style|math|svg|img';

Fixes the issue.

I can't think of anything within an <img> element which should be altered by Markdown. Alt text can't contain HTML elements, src shouldn't be altered, it's a self-closing element so won't have any contents.

Are there any downsides to adding img to this regex?

michelf commented 2 weeks ago

Note that you can add no-break spaces on those empty lines if you want to fix things without fussing with the code.

Also, we have the same problem with other tags too:

<span title="A

multiline

title">text with title</span>

I think the basic issue is that the HTML block parser ignores span-level tags. Those are parsed at a later stage in parseSpan, but that stage is after splitting in paragraphs.

I suppose changing the regex here to accept all tag names would work. https://github.com/michelf/php-markdown/blob/51613168d71787b0fe8472166ccbfa8d285c02cd/Michelf/MarkdownExtra.php#L428-L433 Things to watch for:

Honestly, I'm not sure it's worth solving.