publiclab / PublicLab.Editor

A general purpose, JS/Bootstrap UI framework for rich text posting. An author-friendly, minimal, mobile/desktop interface for creating blog-like content, designed for PublicLab.org
https://publiclab.github.io/PublicLab.Editor/examples/
GNU General Public License v3.0
80 stars 128 forks source link

Compatibility with MS Word - Lists #84

Open avi-jain opened 7 years ago

avi-jain commented 7 years ago

I realize this is not an immediate concern, but I thought I'll just put it here anyway, in case Word compatibility is touched upon in the future. Hope it's alright. So, steps to reproduce - Copy a bulleted list from MS Word into the editor. bugwordlist The cursor spans outside the editor area, and there's some other weird things that happen (try backspacing till the list's gone, and then enter some characters. They get converted to greek symbols because the font-face is still set to symbol. I checked whether Gmail's editor does this too, and it seems to.) gmailword

Content copied from Word doesn't get the conventional <ul> or <li> tags, but rather some Microsoft-y classes and attributes. (In this case, it also gets a text-indent: -18pt which causes this issue)
Copying plain text/tables from Word appears to be working fine. Thanks.

jywarren commented 7 years ago

Hm, interesting -- Do you think you can find the exact HTML that results from pasting lists in? Because then we could potentially intercept and recognize such HTML and convert it more properly.

Thanks!

avi-jain commented 7 years ago

Example format for a single bullet point -

<p class="MsoListParagraphCxSpFirst" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1">
    <span style="font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:
    Symbol">
      <span style="mso-list:Ignore">·
        <span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
        </span>
      </span>
    </span>
Hello
</p>

I tried to look for an existing solution, and CKEditor seems to have a paste from word functionality that cleans up the code - Link to Ticket Plugin There's not much documentation about the mso properties though 😔 (here's an extensive list about the options https://gist.github.com/webtobesocial/ac9d052595b406d5a5c1)

jywarren commented 7 years ago

OK - well, i think there are two ways to do this --

First, we could try to add a matcher to domador, which is what woofmark uses to turn HTML into Markdown, and we use woofmark for our rich editor. domador parses <p> tags here, and we could in theory try to identify ones with class="MsoListParagraphCxSpFirst" -- and treat them differently. We'd want to check how standard these mso type pastes are from MS Word, though the issues you linked to probably have more info on this.

The slightly easier fix which would not be as long-term would be to try to intercept the content using a regex expression on the paste event, and substitute everything between <p class="MsoList and </p>, or something. This could live in our Woofmark adapter in /src/adapters/

Or perhaps there's some middle ground -- it could be worth opening an issue on domador to ask its creator if parsing Word-generated lists is a reasonable addition to domador, or if that had better live in woofmark or even PublicLab.Editor -- because in each case it might be a slightly different implementation. I could definitely understand the opinion that domador is designed to be a minimal parser from well-formed HTML, and that MS Word isn't that (or it'd be generating <ul>s).