tim-gromeyer / html2md

Transform your HTML into clean, easy-to-read markdown with html2md.
https://tim-gromeyer.github.io/html2md/
MIT License
24 stars 2 forks source link

html2md inserts unnecessary new line breaks #65

Closed nuttyartist closed 1 year ago

nuttyartist commented 1 year ago

Hey Tim!

Thanks for this library. I'm planning to use this in my block-editor (https://github.com/nuttyartist/notes/tree/block-editor) when a user paste HTML content into the editor, I want to convert it to Markdown.

But I'm encountering a problem, the same one I encountered with QTextDocument::toMarkdown (after doing setHTML). For some reason both insert line breaks (\n) unnecessarily. For example I took the following random text from the internet (https://news.ycombinator.com/item?id=38108048). m_clipboard->mimeData(QClipboard::Clipboard)->html() returns:

<meta charset='utf-8'>
<span style=\"color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;\">My partner is an Astrophysicist who relies on Gnu Emacs as her daily driver. Her work involves managing a treasure trove of legacy code written in a variety of languages like Fortran, Matlab, IDL, and IRAF. This code is essential for her data reduction pipelines, supporting instruments across observatories such as Keck 1 &amp; 2, the AAT, Gemini, and more.</span>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">Each time she acquires a new Mac, she embarks on a week-long odyssey to set up her computing environment from scratch. It's not because she enjoys it; rather, it's a necessity because the built-in migration assistant just doesn't cut it for her specialised needs.</p>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">While she currently wields the power of an M1 Max MacBook Pro and runs on the Monterey operating system, she tends to stick with the pre-installed OS for the lifespan of her hardware, which often spans several years. In her case, this could be another 2-3 years or even more before she retires the machine or hands it over to a postdoc or student.</p>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">But why does she avoid the annual OS upgrades? It's simple. About a decade ago, every OS update would wreak havoc on her meticulously set-up environment. Paths would break, software would malfunction, and libraries that used to reside in one place mysteriously migrated to another. The headache and disruptions were just not worth it.</p>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">She decided to call it quits on annual OS upgrades roughly 7-8 years ago. While I've suggested Docker as a potential solution, it still requires her to take on the role of administrator and caretaker, which, in her busy world of astrophysical research, can be quite the distraction.</p>"

Using html2md:

My partner is an Astrophysicist who relies on Gnu Emacs as her daily driver. Her\nwork involves managing a treasure trove of legacy code written in a variety of languages\nlike Fortran, Matlab, IDL, and IRAF. This code is essential for her data reduction\npipelines, supporting instruments across observatories such as Keck 1 &amp; 2, the\nAAT, Gemini, and more.\nEach time she acquires a new Mac, she embarks on a week-long odyssey to set up her\ncomputing environment from scratch. It's not because she enjoys it; rather, it's\na necessity because the built-in migration assistant just doesn't cut it for her\nspecialised needs.\n\nWhile she currently wields the power of an M1 Max MacBook Pro and runs on the Monterey\noperating system, she tends to stick with the pre-installed OS for the lifespan of\nher hardware, which often spans several years. In her case, this could be another\n2-3 years or even more before she retires the machine or hands it over to a postdoc\nor student.\n\nBut why does she avoid the annual OS upgrades? It's simple. About a decade ago, every\nOS update would wreak havoc on her meticulously set-up environment. Paths would break,\nsoftware would malfunction, and libraries that used to reside in one place mysteriously\nmigrated to another. The headache and disruptions were just not worth it.\n\nShe decided to call it quits on annual OS upgrades roughly 7-8 years ago. While I've\nsuggested Docker as a potential solution, it still requires her to take on the role\nof administrator and caretaker, which, in her busy world of astrophysical research,\ncan be quite the distraction.\n

While it should return:

My partner is an Astrophysicist who relies on Gnu Emacs as her daily driver. Her work involves managing a treasure trove of legacy code written in a variety of languages like Fortran, Matlab, IDL, and IRAF. This code is essential for her data reduction pipelines, supporting instruments across observatories such as Keck 1 & 2, the AAT, Gemini, and more.\nEach time she acquires a new Mac, she embarks on a week-long odyssey to set up her computing environment from scratch. It's not because she enjoys it; rather, it's a necessity because the built-in migration assistant just doesn't cut it for her specialised needs.\n\nWhile she currently wields the power of an M1 Max MacBook Pro and runs on the Monterey operating system, she tends to stick with the pre-installed OS for the lifespan of her hardware, which often spans several years. In her case, this could be another 2-3 years or even more before she retires the machine or hands it over to a postdoc or student.\n\nBut why does she avoid the annual OS upgrades? It's simple. About a decade ago, every OS update would wreak havoc on her meticulously set-up environment. Paths would break, software would malfunction, and libraries that used to reside in one place mysteriously migrated to another. The headache and disruptions were just not worth it.\n\nShe decided to call it quits on annual OS upgrades roughly 7-8 years ago. While I've suggested Docker as a potential solution, it still requires her to take on the role of administrator and caretaker, which, in her busy world of astrophysical research, can be quite the distraction.

What can be done about this? (QTextMarkdown shares the same problem).

tim-gromeyer commented 1 year ago

Hello, thank you for planning to use/using this library. It really makes me happy if someone uses the things I prigrammed in my free time!

And without really looking into it (because I'm not at rome right now), I think it is because it inserts a line break every 80-100 characters. It can be disabled by setting html2md::Options.splitLines to false (it's true by default)

tim-gromeyer commented 1 year ago

https://tim-gromeyer.github.io/html2md/structhtml2md_1_1Options.html#a9c7ff3534b019736494d465b94411035

nuttyartist commented 1 year ago

Awesome! It works now. And yes, your library will be very useful.

I do need to convert my QString to stdString and back again to QString. Is there a way to avoid doing that?

tim-gromeyer commented 1 year ago

Awesome! It works now. And yes, your library will be very useful.

Thanks, I'm glad to hear that it works :+1:

I do need to convert my QString to stdString and back again to QString. Is there a way to avoid doing that?

No, unfortunately not. At least not now. I'll see if we can avoid it by using templates. That's something that annoyed me too.

nuttyartist commented 1 year ago

Alrighty, thanks a lot! I'm closing this.