mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.95k stars 883 forks source link

Incorrectly decoding encoded HTML tags #106

Open apjones6 opened 9 years ago

apjones6 commented 9 years ago

With a standard MD parser encoded HTML tags allows you to include them as readable text. For example (plain HTML included for comparison):

<iframe src="http://www.w3schools.com"></iframe>
<iframe src="http://www.w3schools.com"></iframe>

results in the output HTML:

<p>&lt;iframe src="http://www.w3schools.com"&gt;&lt;/iframe&gt;</p>
<iframe src="http://www.w3schools.com"></iframe>

However if I put this output HTML through to-markdown the encoded < and > characters are erroneously decoded. This results in the following markdown:

<iframe src="http://www.w3schools.com"></iframe>
<iframe src="http://www.w3schools.com"></iframe>

(whitespace lines removed from examples for brevity)

oliverguenther commented 6 years ago

This is an actual issue for using turndown as a Markdown converter whenever using escaped HTML elements in the input format, as they will be incorrectly output as HTML tags. A quick hack to fix this is ensuring < and > are always re-encoded as entities.

bjones1 commented 1 year ago

Duplicate issue: #261.