mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.62k stars 870 forks source link

Less-than sign (i.e., open angle bracket, '<') not escaped correctly #395

Open telotortium opened 3 years ago

telotortium commented 3 years ago

I recently discovered https://github.com/notlmn/copy-as-markdown, which copies the selected HTML as Markdown. It fails to properly escape angle brackets. For example, in the current README.md for this repo, this text appears: <h1>1. Hello world</h1>. I would expect it to be converted to \<h1\>1. Hello world\</h1\> when I copy this plain text to Markdown. However, it instead appears without backslashes escaping the angle brackets.

In the code, I notice that in the escapes array, there is no entry at all for the open angle bracket <, and it appears that the close angle bracket '>' may only be escaped at the beginning of the string, although I haven't read the code enough to know for sure.

pavelhoral commented 3 years ago

Not sure I understand the issue. HTML blocks are not escaped. Also the <h1>1. Hello world</h1> in is not escaped in the source as well (mainly because it is part of code block there, but that makes your example kind of invalid).

martincizek commented 3 years ago

Please always provide a sample input and output that reproduces the issue.

If I understand it correctly, the issue is that e.g. turndownService.turndown('...&lt;h1&gt;...') becomes ...<h1>..., is that correct?

Btw. this issue will be addressed by the advanced escaping subsystem that is planned. It will be adopted from gfm-escape, which addresses also this. But I guess we should address it already in the current escape subsystem as well.

Please confirm the example above is the case.

holm commented 2 years ago

Just wanted to mention that we also ran into this issue now. We would expect turndownService.turndown('...&lt;h1&gt;...') to be come ...\<h1\>....