toptensoftware / markdowndeep

Open-source implementation of Markdown for C# and Javascript
268 stars 120 forks source link

Incorrect processing of HTML attributes containing '/' character #83

Open agr opened 7 years ago

agr commented 7 years ago

When input markdown contains HTML tags with attributes that contain / character (URLs being the most obvious cause), library fails to parse it properly.

Example input:

<iframe width='400' height='300' src='https://github.com'></iframe>

The output:

<p>&lt;iframe width='400' height='300' src='https://github.com'&gt;</iframe></p>

Expected output: HTML should pass through more or less untouched:

<iframe width='400' height='300' src='https://github.com'></iframe>

The issue here is that HtmlTag.ParseHelper does not correctly handle the / character in the attribute values, considering it, I guess, the end of tag, and then deciding that HTML is malformed and treats it as any other text.

The fix that worked for me is to replace: https://github.com/toptensoftware/markdowndeep/blob/76577d2de1402a7a3d3e311846395607ee9b0d3a/Backup/MarkdownDeep/HtmlTag.cs#L328 line with:

while (!p.eof && !char.IsWhiteSpace(p.current) && p.current != '>' && !p.DoesMatch("/>"))

But I am not sure it won't break something else.