Open avi-jain opened 7 years ago
Hm, interesting -- Do you think you can find the exact HTML that results from pasting lists in? Because then we could potentially intercept and recognize such HTML and convert it more properly.
Thanks!
Example format for a single bullet point -
<p class="MsoListParagraphCxSpFirst" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1">
<span style="font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:
Symbol">
<span style="mso-list:Ignore">·
<span style="font:7.0pt "Times New Roman"">
</span>
</span>
</span>
Hello
</p>
I tried to look for an existing solution, and CKEditor seems to have a paste from word functionality that cleans up the code -
Link to Ticket
Plugin
There's not much documentation about the mso
properties though 😔 (here's an extensive list about the options
https://gist.github.com/webtobesocial/ac9d052595b406d5a5c1)
OK - well, i think there are two ways to do this --
First, we could try to add a matcher to domador
, which is what woofmark
uses to turn HTML into Markdown, and we use woofmark
for our rich editor. domador
parses <p>
tags here, and we could in theory try to identify ones with class="MsoListParagraphCxSpFirst"
-- and treat them differently. We'd want to check how standard these mso
type pastes are from MS Word, though the issues you linked to probably have more info on this.
The slightly easier fix which would not be as long-term would be to try to intercept the content using a regex expression on the paste event, and substitute everything between <p class="MsoList
and </p>
, or something. This could live in our Woofmark adapter in /src/adapters/
Or perhaps there's some middle ground -- it could be worth opening an issue on domador
to ask its creator if parsing Word-generated lists is a reasonable addition to domador, or if that had better live in woofmark
or even PublicLab.Editor
-- because in each case it might be a slightly different implementation. I could definitely understand the opinion that domador
is designed to be a minimal parser from well-formed HTML, and that MS Word isn't that (or it'd be generating <ul>
s).
I realize this is not an immediate concern, but I thought I'll just put it here anyway, in case Word compatibility is touched upon in the future. Hope it's alright. So, steps to reproduce - Copy a bulleted list from MS Word into the editor. The cursor spans outside the editor area, and there's some other weird things that happen (try backspacing till the list's gone, and then enter some characters. They get converted to greek symbols because the
font-face
is still set tosymbol
. I checked whether Gmail's editor does this too, and it seems to.)Content copied from Word doesn't get the conventional
<ul>
or<li>
tags, but rather some Microsoft-y classes and attributes. (In this case, it also gets atext-indent: -18pt
which causes this issue)Copying plain text/tables from Word appears to be working fine. Thanks.