Bug: when pasting into an empty block, the entire fragment is inserted as-is

jameshfisher commented 2 years ago

Description When copy-pasting text from a nested element, the new text is wrapped in extra elements. The extra wrapping elements seem to come from where the context of the text that was copied.

Recording

https://user-images.githubusercontent.com/166966/134528526-ed26e07c-1648-46a3-a80c-1c6a63ea802d.mp4

Sandbox https://codepen.io/jameshfisher/pen/powOjZM

Steps To reproduce the behavior:

Go to https://codepen.io/jameshfisher/pen/powOjZM
Select some text in the first green box
Copy (E.g. Cmd-C)
Put cursor into the second green box
Paste (E.g. Cmd-V)

Expectation The text that was selected when you hit "Copy" is pasted at the cursor location, with no additional elements wrapping it.

Actual behavior The copied text is pasted, but it's wrapped in two additional elements: the red box and the green box.

Environment

Slate Version: 0.66.1
Operating System: macOS
Browser: Chrome, Firefox

jameshfisher commented 2 years ago

It seems this bug was introduced somewhere between 0.65.3 and 0.66.0

jameshfisher commented 2 years ago

I think this bug was introduced by https://github.com/ianstormtaylor/slate/pull/4489 - reverting this specific change results in expected behavior

jameshfisher commented 2 years ago

Copying in @nemanja-tosic @TheSpyder @dylans

I don't think I can do much to help fix this. I'm blocked by:

I can't find any documentation on what a "fragment" really is or represents.
I don't understand why copy-paste in Slate is implemented the way it is. It seems to be a design choice that copying a range drags some of its surrounding elements with it, and that pasting then kinda infers which parts of those wrapping elements to strip out. Why?
That Transforms.insertFragment looks like an absolute beast.

In the short term, I'll revert to 0.65.3.

jameshfisher commented 2 years ago

@nemanja-tosic's description of #4489:

When pasting into an empty block, insert the entire fragment as-is

This is actually a good description of the bug that I'm reporting :)

I'm trying to understand the motivation for #4489. Here's the gif of the behavior #4489 was intended to change (Right - before #4489, left - after #4489):

But to me, the behavior on the right (before #4489) is expected behavior, and the behavior on the left (after #4489) is undesirable! I think if you want something like the behavior on the left, you should do it with some kind of plugin. Is there a reason to put this in the core insertFragment logic?

nemanja-tosic commented 2 years ago

@jameshfisher there were a number of bug reports on the behavior you want and what was in the previous version of slate.

We had a back and forth in that PR but made the decision that a paste to an empty node was an indication that the entire fragment was to be pasted as-is (a fragment is the entire tree from root to cursor) because otherwise the first node, and only the first node, was pasted as text and everything else was pasted as-is. In other words, if you were to copy two headings and a paragraph (h1 + h2 + p), h1 would be pasted as text, h2 and p as-is.

That being said, this is apparently a preference and very difficult to nail down for everyone out of the box. I believe the PR was good as there were bug reports and there was no way to paste the first node as-is, ever. Plate had a plugin trying to work around it, and it was the cause of many headaches for users, as it was not possible to do properly.

If you feel that the previous behavior is what you need, then the best thing to do, i believe, is to let the end user choose the implementation of insertFragment, or, in the terminology of OOP, a strategy.

nemanja-tosic commented 2 years ago

For completness, to quote @dylans in #4486, these are the issues that are asking for a different implementation:

This is the underlying issue of many bugs, including but probably not limited to #3320, #4408, #3926, and possibly related to #3469, #4401.

dylans commented 2 years ago

For what it's worth I absolutely would not revert this. It's possible we have made one or two scenarios not great that need to be addressed, but this change fixed the biggest complaint and most common source of bugs out there when pasting within a slate document (that you would also lose the structure of the first thing that was pasted).

I'm happy to address any edge cases that are now broken as a result, but it's really difficult to know if someone is pasting text from a single block if you want to just paste plain text, or preserve the structure (usually that's why there's a paste special, paste plain, etc.). That's really the biggest source of ambiguity in general, if you highlight just a word and paste into an empty block, do you want to preserve the underlying structure or not (e.g. keep a heading as a heading, or just copy the words).

TheSpyder commented 2 years ago

It's debatable. I believe the "include formatting" behaviour should be based on what is copied, not just what is pasted, and I said as much in 4489. I half agree with @nemanja-tosic - the new behaviour is correct when multiple blocks are pasted, but not when pasting text from a single block element.

Having said that the 4489 change doesn't effect me. My team chose to develop a custom insert fragment implementation long before I became a Slate contributor.

dylans commented 2 years ago

Basing on what is copied is debatable when it’s a single block as there’s no way to easily disambiguate between copying the formatted content of the block vs not. There is always paste as plain text / paste special when you want to lose the formatting.

jameshfisher commented 2 years ago

I think we all agree that user intent is ambiguous when copy-pasting an arbitrary range of an arbitrary document. So it's not possible to make core behavior that works as expected for everyone. In such a case, Slate core should just implement the simplest viable behavior, and let developers build more complex behavior on top of this if they can do better in their specific domain.

It looks like Slate's copy-paste was already pretty complex (I won't pretend to understand it), and it looks like #4489 is adding more heuristics which work for some cases, but break for others (like mine - just copying text from one block to another). I think this goes against the plugin principle.

I've read elsewhere that Notion is an unofficial guide for Slate's default behavior. Notion's behavior is: you can either select a series of inlines from one block, or you can select a series of complete blocks. It's more restricted than Slate's editor.selection, which makes the copy-paste logic much simpler. But perhaps Notion's behavior here is too opinionated for Slate.

I think it's worth me describing how I expected Slate's copy-paste logic to work before I looked at it. Then you can tell me what's wrong with it.

To copy the selection, we would:

Remove all nodes and text that are outside the selection.
Find n, the lowest common ancestor node of the selection.
Take n.children as the copied content.

The copied content is then just a Node[], much like a Slate document. It's not a "fragment" thing. Then to paste the copied content, we just use Transforms.insertNodes.

Here's an example document with a selection:

[
  { type: "h1", children: [ 
    { text: "Blog homepage" }
  ]},
  { type: "preview", children: [
    { type: "h2", children: [
      { text: "Why Haskell <ANCHOR>sucks" }
    ]},
    { type: "para", children: [
      { text: "Imperative programming<FOCUS> is " },
      { text: "great", bold: true },
      { text: "!" }
    ]},
  ]},
]

To copy this selection, we first delete all text and nodes outside the selection:

[
  { type: "preview", children: [
    { type: "h2", children: [
      { text: "<ANCHOR>sucks" }
    ]},
    { type: "para", children: [
      { text: "Imperative programming<FOCUS>" }
    ]}
  ]}
]

Then we find the common ancestor of the range, which is the preview element, and we take its children as the copied content:

[
  { type: "h2", children: [
    { text: "<ANCHOR>sucks" }
  ]},
  { type: "para", children: [
    { text: "Imperative programming<FOCUS>" }
  ]}
]

Note that, in contrast to Slate's current behavior, the copied content would not include the full path from the root node to the selection. I don't know why Slate currently does that.

Then to paste this content, we would just use Transforms.insertNodes.

This is what I'm planning to implement for my own copy-paste needs. What are the problems with this scheme?

nemanja-tosic commented 2 years ago

We did not do multiple blocks as the trigger to paste fragment, because it didn't pass the following test:

As a user, I want to copy a heading and paste it in this scenario, not the heading and some text above it, or any text at all.

With the previous logic where the first block was always inserted as text this was simply impossible to do. With the current logic both pasting as text and as fragment are possible, so it's much less debatable than the previous logic as you can achieve both.

I have mentioned having semantics on copy as well, not to quote that bit, which wasn't discussed in more detail. Even if copy was more deliberate about blocks vs inlines, slate would always merge the first block.

In the end this solution allows you to do both paste as fragment (the default paste), and paste as plain text, whereas before the former was not possible.

nemanja-tosic commented 2 years ago

@jameshfisher a fragment is a Node[].

In your example, you make an assumption that you know which ancestor to target, in this case preview. You cannot do that as Slate sees block > block > inline. So, which ancestor do you chose? First or second? How is slate to know what can be replaced? Without further semantics this becomes an impossible choice and you can break a custom structure.

To give a very simple counter example, i might wish to copy paste a heading + paragraph fragment to a table cell. I wish for them to remain heading + paragraph in the cell. This is quite clearly at odds with what you are asking - do not nest the heading + paragraph in the cell. If we were to implement what you are asking though, what is the common ancestor? A table structure is like: table > tr > td > paragraph > text, or to slate block > block > block > block > inline.

The different desired behavior in these two examples comes from the element being pasted to, one which slate has no idea how to deal with as it sees blocks and inlines not previews and tables. So, at the current state of things it's up to the target to handle paste.

My suggestion in the PR was to, in order to make things more generic, have more semantics on copy, IOW when i'm selecting something i should state my intent on whether i'm selecting this as text or block (notion does this and it makes a lot of sense given what i said before). Notion also does not allow you to select text across blocks, you either select text or two blocks. However, even with the semantical copy, the previous implementation of slate would treat multiple blocks as if i wanted to merge the first block. However you spin it, the previous implementation was unable to paste a single block element as-is. We made it possible by special casing pasting to an empty node, and it is still possible to paste as plain text too.

Again, i acknowledge that this will not be something everyone wants, and i believe slate should expose several strategies for people to use. Reverting isn't an option for us and the people that wrote up the bug reports. This behavior isn't an option for you so the point is finding middle ground here.

jameshfisher commented 2 years ago

@nemanja-tosic in my example, the preview ancestor is targeted because it's the lowest common ancestor (i.e. Path.common) of the anchor and focus:

// Selection state before copy
editor.selection = {
  anchor: { path: [1,1,1], offset: 12 },
  focus: { path: [1,2,1], offset: 22 }
}

// on copy
const targetedNodePath = Path.common(editor.selection.anchor.path, editor.selection.focus.path)

// Now targetedNodePath == [1], the path of the "preview" node

This proposed logic doesn't look at whether anything is a block or inline; it just uses the tree structure. Does that make sense?

I believe this would work for the "copy paste a heading + paragraph fragment to a table cell" example. Here's how it would work:

// The selection before copy
editor.children = [
  { type: "h1", children: [ 
    { text: "Blog <ANCHOR>homepage" }
  ]},
  { type: "para", children: [ 
    { text: "Some paragraph<FOCUS> content" }
  ]},
  { type: "table", children: [
    { type: "tr", children: [
      { type: "td", children: [
        { type: "para", children: [
          { text: '' }
        ]}
      ]}
    ]}
  ]}
]

// On copy, first strip out text and nodes outside the selection:
[
  { type: "h1", children: [ 
    { text: "<ANCHOR>homepage" }
  ]},
  { type: "para", children: [ 
    { text: "Some paragraph<FOCUS>" }
  ]},
]

// Now take the Node.common() of the anchor and focus
// In this example, that's just the root node / editor:
commonAncestor = editor = {
  ...,
  children: [
    { type: "h1", children: [ 
      { text: "<ANCHOR>homepage" }
    ]},
    { type: "para", children: [ 
      { text: "Some paragraph<FOCUS>" }
    ]},
  ]
}

// Now take the ancestor's children as the copied content:
copiedContent = [
  { type: "h1", children: [ 
    { text: "homepage" }
  ]},
  { type: "para", children: [ 
    { text: "Some paragraph" }
  ]},
]

// Next, the user places their cursor into the table cell:
editor.children = [
  { type: "h1", children: [ 
    { text: "Blog homepage" }
  ]},
  { type: "para", children: [ 
    { text: "Some paragraph content" }
  ]},
  { type: "table", children: [
    { type: "tr", children: [
      { type: "td", children: [
        { type: "para", children: [
          { text: '<ANCHOR><FOCUS>' }
        ]}
      ]}
    ]}
  ]}
]

// Then the user pastes. We do Editor.insertNodes(editor, copiedContent, {select:true}), which will leave us with something like:
editor.children = [
  { type: "h1", children: [ 
    { text: "Blog homepage" }
  ]},
  { type: "para", children: [ 
    { text: "Some paragraph content" }
  ]},
  { type: "table", children: [
    { type: "tr", children: [
      { type: "td", children: [
        { type: "para", children: [
          { text: '' }  // Arguably, the paste should remove/replace this empty paragraph, but that's a smaller issue
        ]},
        { type: "h1", children: [ 
          { text: "homepage" }
        ]},
        { type: "para", children: [ 
          { text: "Some paragraph<ANCHOR><FOCUS>" }
        ]},
      ]}
    ]}
  ]}
]

nemanja-tosic commented 2 years ago

I think understand what you are doing there, but i don't think slate can do this generically as i'm not sure it could guarantee that it did not cut off something where it should not have.

Running with the same example and having anchor/focus across two trs in a table, the algorithm would grab two trs?

jameshfisher commented 2 years ago

Indeed, if the structure was:

table
  row
    cell
      "foo"
    cell
      "b<ANCHOR>ar"
  row
    cell
      "baz"
    cell
      "<FOCUS>qux"

Then the copied content with my algorithm would be:

row
  cell
    "ar"
row
  cell
    "baz"
  cell
    ""

I think this is sensible generic behavior. (But it's unclear to me what the expected behavior is if the user does this kind of copy-paste in a table.)

dylans commented 2 years ago

Indeed, if the structure was:
table
  row
    cell
      "foo"
    cell
      "b<ANCHOR>ar"
  row
    cell
      "baz"
    cell
      "<FOCUS>qux"
Then the copied content with my algorithm would be:
row
  cell
    "ar"
row
  cell
    "baz"
  cell
    ""
I think this is sensible generic behavior. (But it's unclear to me what the expected behavior is if the user does this kind of copy-paste in a table.)

Apologies, but this doesn't make sense to me as sensible generic behavior as it would intentionally generate an invalid slate tree which the user could not likely recover from.

If anything needs a solution, it's how to disambiguate wanting to paste a single block as plain text vs the structure that it has, when pasting into an empty block. Previously you would always lose the structure of the first block which as previously stated was the source of at least 5 but more likely dozens of bug reports. The new behavior requires the user to choose paste as plain text or paste special.

We should probably just explore a configuration option if want to lose the structure when pasting the first block, as much as I hate adding more configuration options.

jameshfisher commented 2 years ago

... it would intentionally generate an invalid slate tree which the user could not likely recover from.

"Validity" of the tree is not a Slate concept; it depends on the domain of the specific document. There is no generic copy-paste behavior that preserves "validity" in any specific domain. For example, my domain has nodes of type box which must have exactly two children: the first of type header and the second of type body. Slate's current copy-paste behavior breaks this: if you copy from within a body node, and paste elsewhere, you get a new box node with a body but without a header.

So preserving some notion of validity cannot be a goal of the generic copy-paste behavior. It should just aim to be simple, predictable, and overridable. If you have some domain-specific rules then it should be your job to somehow override/extend/customize the generic copy-paste behavior to achieve that. For example, if a developer has a rule like "tr must be inside a table", then she needs to decide how copy-paste should work to maintain that rule. The method you're suggesting, I think, is that pasting some partial rows of a table should create a new table for them to go in. But there are many other possible solutions (like, put the pasted rows into the current/nearest table). I don't think it's Slate's job to use heuristics to guess at what you want.

nemanja-tosic commented 2 years ago

Your example is perfect, as the follow up question is: if there were two plugins with an element of type "header", and the fragment only contains header, which plugins wraps the header?

It comes down to the fact that the way you're describing it leads to a "lossy" conversion - you lose the parent element so now the plugin system needs to guess. The way slate does it now, the fragment contains the entire tree, so there is no loss of data.

In other words - with the current approach you can make it work as you have more data than you need, while with the approach you're suggesting you lose data so there is a possibility that a paste would lead to a completely different structure.

jameshfisher commented 2 years ago

Well, both Copy methods are lossy. My suggested Copy approach is lossy because it removes some ancestors. But the current Copy approach is lossy because it doesn't distinguish between these two very distinct selections:

Selection 1, selecting some text:

table
  row
    cell
      "foo <ANCHOR>bar<FOCUS> baz"

Selection 2, selecting an entire table:

para
  "Previous text<ANCHOR>"
table
  row
    cell
      "bar"
para
  "<FOCUS>Following text"

I think both of these will result in the same thing on the clipboard with the current logic. Then the current Paste logic assumes (unsafely) that the user copied Selection 2, and so creates an entire new table where the user just wanted to paste a single word of text.

One (extreme?) method for lossless copy-paste would be: Copy puts the entire current document and the selection range on the clipboard. Then at paste time, all the context is available to examine: what were the ancestors of the copied range? What were their siblings outside the range? And so on.

TheSpyder commented 2 years ago

We should probably just explore a configuration option if want to lose the structure when pasting the first block, as much as I hate adding more configuration options.

More importantly this should be primarily in slate-react with an equivalent insertFragment API option in slate that removes all guessing.

But the current Copy approach is lossy because it doesn't distinguish between these two very distinct selections:

This is another key, and why I have been trying to explain that what the user copies signals their merge intent. Copy/paste in HTML editors has been debated for literal decades. The commonsense approach is to optimise not for internal clipboard activity but external; if someone copies from Slate and pastes into another editor (google docs seems the most prevalent example now) it should preserve what they mean.

In your examples, if we use the example of a HTML editor, copy should include selected elements but not common ancestors. The effect of this rule is that within-single-block copy is plain text with no ancestors and multi-block selection includes the selected blocks, without the body ancestor, and only partial blocks for the start and end elements.

However this is based on the HTML schema (and some HTML editors do include a single ancestor for within-single-block as long as that's not a table cell).

For Slate which isn't based on HTML and no longer has a built-in schema, perhaps it should copy as I describe, perhaps not, but either way both copy and paste must be overridable to preserve valid document structure if generic rules don't apply.

One (extreme?) method for lossless copy-paste would be: Copy puts the entire current document and the selection range on the clipboard. Then at paste time, all the context is available to examine: what were the ancestors of the copied range? What were their siblings outside the range? And so on.

This concept does exist in the Windows HTML clipboard spec but I don't think any editors other than Microsoft Word use it now (probably because they're cross platform and usually written in JS). And even Word only copies the selected portion of the ancestor tree, as Slate does now, not the entire document.

dylans commented 2 years ago

I believe rather strongly that the current default behavior is the least surprising to most users given the large number of issues reported by losing the structure of the first block when pasting with no option to recover that structure prior to the recent release.

With the new behavior there's at least a way to either preserve structure or paste as plain text. Before it was impossible to support the former.

I also agree that it needs to be easier to customize this the way you want as discussed above and in #4489, likely through configuration and multiple insertFragment factories. If someone wants to raise a PR, I'm happy to collaborate.

As a further example from today, someone on Slack in the plate channel asked why pasting a table into an empty block works as expected, but pasting one into a block with content loses the structure of the first cell.

TheSpyder commented 2 years ago

As a further example from today, someone on Slack in the plate channel asked why pasting a table into an empty block works as expected, but pasting one into a block with content loses the structure of the first cell.

Tables always require special case code though. Even my previously stated logic would still merge in that case because it's pasting a single block element. Comparing the pasted block name with the block name where the cursor is can alleviate this, but without a schema Slate doesn't have a way to know what the block name is. All we can do is provide a default and make it easy to override.

jameshfisher commented 2 years ago

Here is a Codepen that shows how to implement the simpler copy-paste logic that I suggested, in general shows how to tailor copy-paste to your needs. This simpler logic seems to work perfectly for my needs.

It wasn't immediately obvious to me how to customize/override Slate's copy-paste. I think it would be good to have an example like this in the Slate examples.

TheSpyder commented 2 years ago

Yes using a custom clipboard format is definitely a great way to detect in-editor copy/paste. Do note however that it will only work within a single browser, not across Slate editors in different browsers (maybe not even across tabs, I haven't checked recently).

dylans commented 2 years ago

Yes using a custom clipboard format is definitely a great way to detect in-editor copy/paste. Do note however that it will only work within a single browser, not across Slate editors in different browsers (maybe not even across tabs, I haven't checked recently).

In the future I think we want to switch to the newer asynchronous Clipboard API ( https://developer.mozilla.org/en-US/docs/Web/API/Clipboard_API ) which would give more reliability to what we can do across tabs and browsers and would work more consistently in Safari (which doesn't support copying images in the currently implemented Clipboard API), but my understanding is that it's at least 6-12 months from being ready in today's browsers.

TheSpyder commented 2 years ago

Yes we're keeping an eye on that API for TinyMCE as well. I think we're close to it being useful in every "modern" browser, Firefox seems to be the one lagging behind a bit.

One of the main concerns at the moment is that it requires a secure context (https) which can cause problems for integrators expecting to be able to use a local network HTTP server.

nemanja-tosic commented 2 years ago

In the Codepen, i cannot copy an orange box and paste as an orange box, which is something we use, so the solution is less predictable from our point of view.

Also, when i copy brown box and orange box the fragment contains two "box" elements, which is ambiguous and if another plugin has a box element within another one you cannot reconstruct the proper element. An example would be if we had a Heading element (as in h1, h2, h3) and a table Heading element, both of them with the type "Heading". If my fragment contains "Heading" with the current logic, which Heading am i trying to paste?

This ambiguity is impossible to solve without the entire tree, or at least the entire element that is copied, and while it works in your case, it does not work as a generic solution.

JoseVov commented 4 months ago

This is how I fixed it

import { Editor, Node } from 'slate';

// Function to get plain text from child nodes (excluding 'page')
const getPlainTextFromPageChildren = (editor: any) => {
  const { selection } = editor;

  if (!selection) {
    return '';
  }

  // Get the fragment from the current selection
  const fragment = Editor.fragment(editor, selection);

  // Initialize an empty array to collect plain text from children
  let plainText: any[] = [];

  // Iterate through the fragment to extract plain text from child nodes
  fragment.forEach((node: any) => {
    if (node.type === 'page') {
      // If it's a 'page' node, get the plain text from its children
      node.children.forEach((child: any) => {
        plainText.push(Node.string(child)); // Convert child node to string
      });
    } else {
      // For any other node, just convert to string
      plainText.push(Node.string(node));
    }
  });

  // Join the plain text with line breaks to maintain structure
  return plainText.join('\n');
};

/**
 * If you try to copy a paragraph into the editor by default it will copy the whole node.
 * We only have a single node by page which is the component with the type page and all the children.
 * You can test commenting the content of this function and try to copy and paste a paragraph in the editor and you
 * will see that the whole node with the page is copied and when is pasted a page will be added as a child of the current page
 * affecting significantly the UI
 */
const handleEditorCopyAction = (event: any, editor: any) => {
  const plainText = getPlainTextFromPageChildren(editor); // Get plain text from child nodes

  if (plainText) {
    event.clipboardData.setData('text/plain', plainText); // Copy only plain text to clipboard
    event.preventDefault(); // Prevent default copy action
  }
};

export default handleEditorCopyAction;

And then just

<Editable onCopy={(event) => handleEditorCopyAction(event, editor)} renderElement={renderElement} />

ianstormtaylor / slate

Bug: when pasting into an empty block, the entire fragment is inserted as-is #4542