quantizor / markdown-to-jsx

🏭 The most lightweight, customizable React markdown component.
https://markdown-to-jsx.quantizor.dev/
MIT License
2.03k stars 174 forks source link

Issue parsing nested HTML elements #255

Open iprignano opened 5 years ago

iprignano commented 5 years ago

First of all thanks for you work on the library. We've recently adopted it and encountered an issue when trying to parse and render some markdown mixed with HTML (which is there to provide backwards compatibility with another system, hence can't be removed).

In fact when trying to parse three nested <div>s:

<div>
  <div>
    <div></div>
  </div>
</div>

markdown-to-jsx will output this:

<div>
  <div></div>
  <div></div>
</div>
<p>&lt;/div&gt;</p>

It can be reproduced on the demo site, and it seems to happen when triple-nesting any HTML element. The issue looks similar to https://github.com/probablyup/markdown-to-jsx/issues/168 but I'm not sure if it's exactly the same problem.

iprignano commented 5 years ago

@jthistle Thanks for your reply!

I just wanted to add that for us this was a dealbreaker, so we had to unfortunately move to a different library. Regardless, thank you for looking into it.

quantizor commented 5 years ago

True, but none of those are small enough to fit the size requirement of this library. It will never be perfect, but it’s good enough for most use cases.

On Sat, Oct 12, 2019 at 9:36 AM James Thistlewood notifications@github.com wrote:

Ok, so I've taken a long look at this. The problem basically stems from the fact that regex is being used to parse HTML. HTML is not a regular language, and you cannot use regex to parse it https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. You can make it seem like it works, but it will never properly work, hence bugs like this.

I'd recommend using a real XML/HTML parser instead.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/probablyup/markdown-to-jsx/issues/255?email_source=notifications&email_token=AAELFVVKI2KKQSKBY53SPEDQOHHF7A5CNFSM4HXJOSXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBB7TBY#issuecomment-541325703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAELFVRHOFUK5WMS3DGWC4LQOHHF7ANCNFSM4HXJOSXA .

callmemb commented 4 years ago

Why not skip HTML tags and let react handle those ?