slab / quill

Quill is a modern WYSIWYG editor built for compatibility and extensibility
https://quilljs.com
BSD 3-Clause "New" or "Revised" License
43.75k stars 3.4k forks source link

Copying bullets from a Word Doc doesn't create bulleted list #1225

Open arwagner opened 7 years ago

arwagner commented 7 years ago

Copying and pasting unordered bullets from Word puts a bullet symbol in the editor instead of an actual bullet.

Steps for Reproduction

  1. Open https://www.dropbox.com/s/61gwc7evz398xki/test.docx?dl=0 in Word
  2. Select all in Word, and copy
  3. Visit http://quilljs.com/playground/#autosave
  4. Click into the editor and paste
  5. Click on the word "One"
  6. Click on the "unordered bullets" icon in the toolbar of the editor

Expected behavior: The bullet gets removed

Actual behavior: A real bullet gets created, containing the bullet symbol from the clipboard

Platforms:

All

Version: All

arwagner commented 7 years ago

I'd love to get some guidance on a proper approach to fixing this issue.

I've been playing around with creating a custom matcher for clipboard to do this. The matcher essentially ignores any "p.MsoListParagraphCxSpMiddle" or "p.MsoListParagraphCxSpLast" tags (returns a Delta that doesn't do anything), and, for any "p.MsoListParagraphCxSpFirst" iterates through that tag, and its siblings, until it finds the "...SpLast" tag.

But, from there, I'm not sure exactly what the right thing to do is. The deltas that you get from creating a bullet list manually in quill are kind of strange, and I'm not sure that the matcher should be creating them from scratch? Should it be creating deltas from a List blot? I'm a bit confused as to whether or not I'm even on the right track.

jhchen commented 7 years ago

How you looked at how the officially supported matchers in the clipboard work? It uses its own API the same way a third party would. If so what are specific things you have tried and have not gotten to work?

arwagner commented 7 years ago

Yes, I've looked at the built-in matchers. http://codepen.io/anon/pen/ENqRdP is what I have so far, but I'm not sure what should go in the "addNodeToDelta" function. None of the built-in matchers quite seem to do what I'm trying to do here, unless I'm misunderstanding them.

jhchen commented 7 years ago

The purpose of a matcher is to return a Delta representing a given node. If you fulfill this contract, the clipboard can build a Delta for the entire pasted tree. By traversing siblings and attempting to return Deltas for them instead, you are not fulfilling this contract. I would also suggest taking a look at Delta documentation. One of the more important takeaways from the Delta docs is not to create them by hand.

arwagner commented 7 years ago

Yes, I did read the Delta documentation. But I think, in this case, what I want to do is to construct a single delta with a List blot embedded, which corresponds to all the paragraphs that correspond to bullets. Is that not correct? You say that the contract is a one-to-one correspondence between deltas and nodes, yet there are a number of times in https://github.com/quilljs/quill/blob/develop/modules/clipboard.js where previousSibling, nextSibling, etc. are called. What I can't find an example of is a matcher which results in a particular blot.

jhchen commented 7 years ago

return { ops: [] } is constructing a Delta by hand. When Quill's clipboard is using sibling, it does so for context about the current Delta.

DavidReinberger commented 7 years ago

@arwagner did you had any luck with this issue?

DavidReinberger commented 7 years ago

So after a few hours of works, I have a solution. IMHO it is probably not the most elegant, but it works for unordered lists pasted from MS Word. Unfortunately, it does not work for ordered lists (any hints why), the implementation seems the same as for unordered lists.

const MSwordMatcher = function (node, delta) {

  const _build = [];

    while (true) {

        if (node) {

            if (node.tagName === 'P') {

                const content = node.querySelectorAll('span'); //[0] index contains bullet or numbers, [1] index contains spaces, [2] index contains item content
                const _nodeText = content[2].innerText.trim();
                //const _listType = content[0].innerText.match(/[0-9]/g) ? 'ordered' : 'bullet'; //@TODO: implement ordered lists

                _build.push({ insert: `${_nodeText}\n`, attributes: { 'bullet': true } });

                if (node.className === 'MsoListParagraphCxSpLast') {
                    break;
                }

            }
        }

        node = node.nextSibling;

    }

    return new Delta(_build);

};
const matcherNoop = (node, delta) => ({ ops: [] });

While initing quill

modules: {
        clipboard: {
            matchers: [
                ['p.MsoListParagraphCxSpFirst', MSwordMatcher],
                ['p.MsoListParagraphCxSpMiddle', matcherNoop],
                ['p.MsoListParagraphCxSpLast', matcherNoop],
            ]
        },
}

ping @arwagner (if you are still interested)

SamDuvall commented 7 years ago

I tried to take the example from @DavidReinberger and apply the feedback from @jhchen on this issue. I wanted to preserve bullet vs ordered, indentation as well as allow HTML within each list item. Any feedback / suggestions are welcome.

Note: I am using underscore in the below code, but that could be removed.

const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
];

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = _.map(delta.ops, _.clone);

  // Trim the front of the first op to remove the bullet/number
  let first = _.first(ops);
  first.insert = first.insert.trimLeft();
  let firstMatch = first.insert.match(/^(\S+)\s+/);
  if (!firstMatch) return delta;
  first.insert = first.insert.substring(firstMatch[0].length, first.insert.length);

  // Trim the newline off the last op
  let last = _.last(ops);
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let prefix = firstMatch[1];
  let listType = prefix.match(/\S+\./) ? 'ordered' : 'bullet';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '')
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}
Subtletree commented 4 years ago

Thanks @SamDuvall, your matchers are working flawlessly for me.

azoof-ahmed commented 3 years ago

Note: I am using underscore in the below code, but that could be removed.

@SamDuvall what's that underscore and what do you mean it can be removed? 😶

Subtletree commented 3 years ago

@Azuf He's talking about underscore.js library

Here's a vanilla version

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let first = ops[0];
  first.insert = first.insert.trimLeft();
  let firstMatch = first.insert.match(/^(\S+)\s+/);
  if (!firstMatch) return delta;
  first.insert = first.insert.substring(firstMatch[0].length, first.insert.length);

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let prefix = firstMatch[1];
  let listType = prefix.match(/\S+\./) ? 'ordered' : 'bullet';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '')
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}
darshakeyan commented 2 years ago

Copying and pasting unordered bullets from Word puts a bullet symbol in the editor instead of an actual bullet.

Steps for Reproduction

  1. Open https://www.dropbox.com/s/61gwc7evz398xki/test.docx?dl=0 in Word
  2. Select all in Word, and copy
  3. Visit http://quilljs.com/playground/#autosave
  4. Click into the editor and paste
  5. Click on the word "One"
  6. Click on the "unordered bullets" icon in the toolbar of the editor

Expected behavior: The bullet gets removed

Actual behavior: A real bullet gets created, containing the bullet symbol from the clipboard

Platforms:

All

Version: All

Have you solve this issue ? @arwagner can you please help I have to delivery ASAP to customer but i could not able to find the solution anywhere ?

Subtletree commented 2 years ago

@darshak369 there are a couple of solutions listed above in this issue

darshakeyan commented 2 years ago

@darshak369 there are a couple of solutions listed above in this issue

Thanks for reply @Subtletree I have tried all of them not working anything for me If you can please give the idea what to do in this causing the formatting issue only on MS word desktop app only

Subtletree commented 2 years ago

@darshak369 Hmm sounds frustrating that they are not working!

The following is working for me:

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let bulletOp = ops.find((op) => op.insert && op.insert.trim().length);
  if (!bulletOp) { return delta }

  bulletOp.insert = bulletOp.insert.trimLeft();
  let listPrefix = bulletOp.insert.match(/^.*(^·|\.)/) || bulletOp.insert[0];
  bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length);

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let listType = listPrefix[0].length === 1 ? 'bullet' : 'ordered';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '');
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}

const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
  ['p.msolistparagraph', matchMsWordList]
];

// When instantiating a quill editor
let quill = new Quill('#editor', {
  modules: {
    clipboard: { matchers: MSWORD_MATCHERS }
  }
});

When writing this up I found a couple of edge cases that didn't work, so the above should now work for lists with only one bullet and won't strip the first word from each bullet in some cases.

Word image

Pasted into quill image

darshakeyan commented 2 years ago

@darshak369 Hmm sounds frustrating that they are not working!

The following is working for me:

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let bulletOp = ops.find((op) => op.insert && op.insert.trim().length);
  if (!bulletOp) { return delta }

  bulletOp.insert = bulletOp.insert.trimLeft();
  let listPrefix = bulletOp.insert.match(/^.*(^·|\.)/) || bulletOp.insert[0];
  bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length);

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let listType = listPrefix[0].length === 1 ? 'bullet' : 'ordered';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '');
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}

const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
  ['p.msolistparagraph', matchMsWordList]
];

// When instantiating a quill editor
let quill = new Quill('#editor', {
  modules: {
    clipboard: { matchers: MSWORD_MATCHERS }
  }
});

When writing this up I found a couple of edge cases that didn't work, so the above should now work for lists with only one bullet and won't strip the first word from each bullet in some cases.

Word image

Pasted into quill image

Thanks for your solution @Subtletree Its means a lot.

I had tried this solution in quill playground but unfortunately its not working...

This is playground code which I have copy and paste similar to what you have mention above. considering the screen-sorts you have mentioned it seems like solution is absolutely correct. and its working fine from your side.

https://codepen.io/darshak434/pen/GRMjvwr

The MS word file which I am copying content :

image

You can find file here - https://1drv.ms/w/s!AtzwzPKX4hPigSpGIzLT2ezREQiL?e=cUkyth

Open with MS word desktop app and copy the content and paste to the above quill editor.

After copy and paste this word content I am getting following result -

image

can you please share with me all specification you were using like name of the version, operating system and all. so that I can able to understand was it happening to my system only.

Thanks

Subtletree commented 2 years ago

Looks like I don't have permissions to download the word doc. I tested from another word doc into the codepen and it worked ok on:

Windows 10 21H1 Office 365 Word 2111 Chrome 96.0.4664.93 Firefox 95

image

Wonder if it's to do with the specific type of bullets or something, let me know when you've changed those permissions and I'll try with your doc!

darshakeyan commented 2 years ago

Looks like I don't have permissions to download the word doc. I tested from another word doc into the codepen and it worked ok on:

Windows 10 21H1 Office 365 Word 2111 Chrome 96.0.4664.93 Firefox 95

image

Wonder if it's to do with the specific type of bullets or something, let me know when you've changed those permissions and I'll try with your doc!

Hey @Subtletree

Here is the link of doc file you can download directly going to the link -

https://drive.google.com/drive/folders/1txcKIDmrT6tjerPrqy_8THbSaETHrj0f?usp=sharing

Here is the case - I have tested from another new word doc by writing the bullets points and its working fine for me as well. yet if we copy content from doc provided by customer to quill Its not formatted in same manner.

you can check I have share the doc file to you.

Thanks

Subtletree commented 2 years ago

Looks like those bullets are nested as a p.MsoNormal class for some reason instead of p.MsoListParagraph etc.

The following works but I haven't done heaps of testing with it. It's possibly quite brittle e.g with a non standard bullet (like arrows) in a p.MsoNormal, the list won't be detected.

const Delta = Quill.import('delta');

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let bulletOp = ops.find((op) => op.insert && op.insert.trim().length);
  if (!bulletOp) { return delta }

  bulletOp.insert = bulletOp.insert.trimLeft();
  let listPrefix = bulletOp.insert.match(/^.*?(^·|\.)/) || bulletOp.insert[0];
  bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length).trimLeft();

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let listType = listPrefix[0].length === 1 ? 'bullet' : 'ordered';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '');
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}

function maybeMatchMsWordList(node, delta) {
  if (delta.ops[0].insert.trimLeft()[0] === '·') {
    return matchMsWordList(node, delta);
  }

  return delta;
}

const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
  ['p.MsoListParagraph', matchMsWordList],
  ['p.msolistparagraph', matchMsWordList],
  ['p.MsoNormal', maybeMatchMsWordList]
];

// When instantiating a quill editor
let quill = new Quill('#editor', {
  modules: {
    clipboard: { matchers: MSWORD_MATCHERS }
  },
  placeholder: 'Compose an epic...',
  theme: 'snow'
});
darshakeyan commented 2 years ago

MsoListParagraphCxSpLast

Thanks @Subtletree for your effort and time Its working fine but as you said it is quite brittle such as spaces before the bullets paragraph , not retain spaces between lines etc. you can find this document to the same link - https://drive.google.com/drive/folders/1txcKIDmrT6tjerPrqy_8THbSaETHrj0f?usp=sharing

image

Is there any solution to retain that as well like similar to p.MsNormal you did in above function ?

It will be very helpful to my customer.

Subtletree commented 2 years ago

@darshak369 I've edited my last comment to add a trimLeft on this line: bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length).trimLeft(); Which should fix the spacing at the start (but will also strip any intentional spacing at the start)

The paragraph spacing issue has nothing to do with the bullets really. I think quill doesn't handle before and after paragraph spacing so would need to be handled in a custom way. If you just added a new line after each paragraph instead of using paragraph spacing it would work fine but hard to tell your customer that 😅

darshakeyan commented 2 years ago

Thank you very much for this solution @Subtletree I am very glad. everything is work as expected and it means a lot. 👍

The paragraph spacing issue is not a big issue that should be fine without it. but yes it definitely hard to tell customer 😂

Subtletree commented 2 years ago

Very welcome @darshak369! I've updated our code to use the new changes so has helped me too.

berott commented 2 years ago

Thank you very much for your work! If I paste my word-list to https://codepen.io/darshak434/pen/GRMjvwr?editors=1111 I get the correct result and the correct p-classes. <p class="MsoListParagraphCxSpMiddle" style="margin: 0cm 0cm 0cm 216pt; font-size: 12pt; font-family: Calibri, sans-serif; text-indent: -18pt;"><span style="font-size: 24pt; font-family: Wingdings;">§<span style="font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 7pt; line-height: normal; font-family: &quot;Times New Roman&quot;;"> </span></span><span style="font-size: 24pt;">Value&nbsp;&nbsp;&nbsp;&nbsp;<o:p></o:p></span></p>

In my context (with ngx-quill) I get the following node, if i do a console.log(node); in my matcher-methode: <p><span style="font-size:24.0pt;font-family:Wingdings;mso-fareast-font-family:Wingdings; mso-bidi-font-family:Wingdings"><span style="mso-list:Ignore">§<span style="font:7.0pt &quot;Times New Roman&quot;"> </span></span></span><span style="font-size:24.0pt">Value<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp; </span></span></p>

What can be the reason for this difference? I paste the same content but I get different nodes (and therefor different deltas) in the matcher-methods?

abhinavprasad98 commented 2 years ago

Hi @Subtletree I am new to angular. Can you please help me where to paste the piece of code which you have shared and how to make it work?. I tried pasting it in the app.component.ts where I have written the code for quill functionality by changing a few things to adapt it to .ts like adding this. and removing const.

It gave me zero errors and compiled it, but still, it is just pasting the bulletins from MS Word with &nbsp without adding

abhinavprasad98 commented 2 years ago

Hi @darshak369. I am trying to implement the Quill Editor in Angular. I have implemented Quill editor in the App component itself. I have copied the code by @Subtletree to the app.component.ts and made necessary changes to suit TypeScript.

It is getting complied successfully, but the issue of ordered/unordered list getting created when pasting bulletins from MS Word still exists. Need your help on how to make this work please. It's very critical for my work.

davidwintermeyer commented 2 years ago

Hi there,

I know I'm following up on a long running thread. This is a major pain for my work as well. I'm curious, is this bug open because no one has been able to devote time to it, or because it doesn't seem to have a feasible solution?

Thanks!

Subtletree commented 2 years ago

Hey!

I think even if we created a proper PR for this fix it probably wouldn't be merged and released as it seems quill is mostly abandoned? https://github.com/quilljs/quill/issues/3521 https://github.com/quilljs/quill/issues/3359

The code above has fixed the bug in my environment but it seems like the nodes copied from word can vary in other environments. Can't know for sure but if someone put time into finding out why then I think a solution would be feasible.

timotheedorand commented 8 months ago

Thank you @Subtletree This works https://github.com/quilljs/quill/issues/1225#issuecomment-992267444