vsch / idea-multimarkdown

Markdown language support for IntelliJ IDEA.
https://plugins.jetbrains.com/plugin/7896-markdown-navigator
Apache License 2.0
815 stars 129 forks source link

Intellij crashes when pasting word content into markdown document #408

Closed holgerbrandl closed 7 years ago

holgerbrandl commented 7 years ago

Not sure why, but the plugin is crashing IJ (like being not responsive anymore and requiring a forced quit) when pasting the contents from the attached word file into the markdown editor.

MacOS 10.12.3 MS Word 15.30

IntelliJ IDEA 2016.3.5 Build #IU-163.13906.18, built on March 6, 2017 Licensed to Max-Planck-Institut fuer Molekulare Zellbiologie und Genetik / holger brandl You have a perpetual fallback license for this version Subscription is active until January 25, 2018 JRE: 1.8.0_112-release-408-b6 x86_64 JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o

MN version: 2.3.4.8

I can paste the same content into a plain text editor within IJ without any problems.

ij_md_crash.docx

vsch commented 7 years ago

@holgerbrandl, the paste in Markdown document tests to see if the clipboard has a "text/html" clipboard content representation. There is something in the HTML for this file that causes a bug in HTML to Markdown parser.

I don't have MS Word 15 for OS X, I will add text/html trace functionality into the next release that will dump the text before parsing HTML to the log so I can debug what is being passed in. That way I will be able to debug the HTML to Markdown converter.

I'll post an update here.

holgerbrandl commented 7 years ago

Just let me know once I should retest it. In general pasting from MS office products under MacOS seems a bit off. Like when pasting from Excel image So my test table was copied and ended up totally wrong in MD even if the IJ clipboard detected a correct plain text version of my current clipboard.

But for sure broken formatting is still way better than a hard IDE crash. :-)

vsch commented 7 years ago

@holgerbrandl, the hang is due to a bug that causes an infinite loop in the parser by forgetting to skip an element which is not being recognized. I am building a version that will allow pasting HTML that would be converted to Markdown so that your paste will give the HTML that I can use to debug.

It will be useful when you don't want to convert to Markdown right away. The HTML to Markdown intention can be used to make the conversion later, giving the user a chance to fix up some HTML quirks.

If you could provide the "HTML" office generates for your use cases, using the new version, I will be able to fix (I hope) the HTML parser to recognize it for proper Markdown conversion.

My last version of MS Office for Mac was from 2009, I did not upgrade it because I don't use MS products anymore.

Your crash file when opened in LibreOffice resulted in the following HTML, note that the list item text is outside the <li></li> tags, so HTML converter used to result in just the bullet markers. Fixed the "bug" and now converter treats p tags in lists but not list items as a list item.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
→<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
→<title></title>
→<meta name="generator" content="LibreOffice 5.2.3.3 (MacOSX)"/>
→<style type="text/css">
→→@page { margin: 2cm }
→→p { margin-bottom: 0.25cm; direction: ltr; line-height: 120%; text-align: left; orphans: 2; widows: 2 }
→</style>
</head>
<body lang="de-DE" dir="ltr">
<ul>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Equipment</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Chemicals</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Consumables</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Enzymes</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">GMO</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Antibodies</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">DNA
→Constructs</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">RNA
→Constructs</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Vectors</span></p>
→<li/>
<p style="margin-bottom: 0cm; line-height: 100%"><span lang="en-US">Oligos</span></p>
</ul>
</body>
</html>
holgerbrandl commented 7 years ago

I'm not sure if you should even try to overcome issues in text/html representations. No parser in the world will cover all imaginable versions of broken html. The example above looks like one of those broken cases to me. Falling back to plain-text pasting in case txt/html representation is not well-formed (with respect to e.g. w3c markup validation) seems totally legit to me.

vsch commented 7 years ago

@holgerbrandl, I agree. Next version will have this disabled. However, it is very convenient to have it working, especially from the browser. So now it is a configurable option that can be enabled when needed. Detecting when the HTML is too malformed to use is also difficult.

EAP available with two new settings that disable text/html detection on the clipboard disabled by default and allow to disable the HTML to markdown conversion. If you could please post the HTML content for the hang from Word and also the table copy/paste from Excel I will address these.

New Settings:

vsch commented 7 years ago

@holgerbrandl, was able to get this duplicated by installing my older version of MS Office. EAP updated please let me know if your issues have been resolved.

flexmark-java library used for parsing updated with fixes for the hang and the incorrect table parsing from Excel:

0.18.2

vsch commented 7 years ago

@holgerbrandl, now this is the result from Excel:

image

Need to have HTML automatically converted to markdown since it has a lot of blank lines which break up the HTML blocks if it is inserted as HTML into markdown. Select Convert HTML content to Markdown in settings/preferences to get this on paste:

image

The table header needs to be moved manually using Move Line Up action in the IDE. Excel does not use <thead></thead> and all rows a body rows.

vsch commented 7 years ago

@holgerbrandl, here is the result of paste with conversion to Markdown of your crash file:

image

image

Numbered lists:

image

image

holgerbrandl commented 7 years ago

The word paste crash issues is solved for me as well, the inserted bits use a tab instead of a space which breaks the list rendering: image However, for me it's perfect already and easy to correct with column edit

The excel table paste now brings up the "paste image" instead. image I'm not sure if excel changed meanwhile to also provide an image representation or if the plugin handles the clipboard differently now.

vsch commented 7 years ago

@holgerbrandl, I will address both the tab and the image paste pop-up. I had the image dialog pop-up once but was not able to duplicate it, so assumed it was a glitch. I will ignore the image on the clipboard if HTML is enabled option is enabled and available on the clipboard.

The tabs I will convert to spaces during HTML to Markdown conversion.

vsch commented 7 years ago

@holgerbrandl, the list conversion you are seeing is standard IDE text paste not the plugin HTML to Markdown conversion. If you enable the two options:

image

for HTML clipboard handling then you will get:

image

You can convert it to a tight list using the toolbar button: image

holgerbrandl commented 7 years ago

oh, now it works. Not sure why I disabled the option in the first place. Potentially to overcome the now-fixed crash bug I guess. The list compression button is very handy.

vsch commented 7 years ago

You also have an intention to clean up empty list items, sometimes HTML to Markdown can create these:

image

To get:

image

vsch commented 7 years ago

@holgerbrandl, EAP released with fix giving mime text/html clipboard content higher priority over images if Use clipboard text/html content when available is enabled. This will take care of the image paste of Excel copied table instead of Markdown table.

I think that this version should work with Excel tables for your excel version. May need some editing for tables with a lot of cell formatting.

If your Excel table paste is not correct, please turn off the Convert HTML content to Markdown temporarily so you can paste the actual HTML content and post it here so I can fix it. I only have Office 2011 for Mac so my HTML may differ from yours.

holgerbrandl commented 7 years ago

Thanks Vladimir for your amazing support. Both features work as described now.

It was quite a ride, but I think the issue is resolved now. :-)

vsch commented 7 years ago

Complexity of creating full featured JetBrains plugins is greatly underestimated. 😄