remarkjs / remark-rehype

plugin that turns markdown into HTML to support rehype
https://remark.js.org
MIT License
266 stars 18 forks source link

Anchor tag breaks markdown #15

Closed craftzdog closed 3 years ago

craftzdog commented 3 years ago

I'm not sure where the actual problem resides but I report it here.

Subject of the issue

If a given HTML has an anchor tag, it yields the broken markdown.

Your environment

Steps to reproduce

If you have an HTML like so:

<div>
  <a name="foo" />
  <span>
    <div>test1</div>
  </span>
</div>

Convert it into Markdown:

export default function HTML2Markdown(html: string): string {
  var unified = require('unified')
  var parse = require('rehype-parse')
  var rehype2remark = require('rehype-remark')
  var stringify = require('remark-stringify')

  const c = unified()
    .use(parse)
    .use(rehype2remark)
    .use(stringify, {
      listItemIndent: '1',
      commonmark: true,
      fences: true
    })
  return (
    c
      .processSync(html)
      .toString()
}

Expected behavior

You should get:

test1

Actual behavior

It outputs:

[test1](<>)

[](<>)
wooorm commented 3 years ago

rehype is for HTML, not for XML. Your link isn’t closed, you have divs in spans, it’s all messed up! If I open your “HTML” in a browser, I get something very similar to what you’re getting with remark/rehype:

Screen Shot 2020-10-30 at 08 17 13

Why do you have such weird HTML? Can you fix that?

craftzdog commented 3 years ago

Thanks for the reply. Yes, I thought the same. I use remark-rehype for supporting importing HTML files into my app as Markdown. I got this broken HTML from my app's user who exported a note from Evernote. So, it looks like an Evernote's bug where it generates such weird HTML..

wooorm commented 3 years ago

oh gosh, that’s some really weird stuff that Evernote is doing 😅

It’s indeed a bug there, you could report it with them maybe, but I’m not sure they’d fix it. An alternative would be to do some processing with a plugin on “evernote html”? Or maybe support their .enex XML files (I see some projects and github that do that)? 🤷‍♂️

Anyway, not something for this project!