jejacks0n / mercury

Mercury Editor: The Rails WYSIWYG editor that allows embedding full page editing capabilities directly inline.
http://jejacks0n.github.com/mercury
Other
2.63k stars 530 forks source link

Paste sanitization is breaking pastes that don't need it / impacting performance negatively #23

Closed jejacks0n closed 13 years ago

jejacks0n commented 13 years ago

copy something, and then mash on command/ctrl+p, and you'll see other content being duplicated.

jejacks0n commented 13 years ago

so, long story short.. webkit embeds style attributes when copying html.. this is really annoying, and I had defaulted to always cleaning out style attributes on paste. The second aspect is that firefox and webkit don't expose the clipboard data on paste (I have to try and figure it out after the paste happens).. when someone is mashing on the paste command (which seems to happen a lot when people are testing it out) I have to try and figure out what was pasted and then sanitize it -- which is also done (somewhat differently) when things are pasted from ms word or pages etc.

This is clearly expensive and is a workaround / shim, but I don't see another way to handle it. If someone has some insights I'd like to hear them, but these features are important to have.

Right now they're disabled via configuration, and I'm leaving this open for more feedback.

jejacks0n commented 13 years ago

What I'd really like to see is the browser vendors properly expose the clipboard content on paste.. It's not like I can't get it after it's pasted anyway, it's just a lot slower to try and do. Seriously annoying.

JeanMertz commented 13 years ago

Just here to tell you that I worked on my own (simple) implementation of a HTML5 editor about half a year ago and I ran into similar problems.

At first I simply stripped all html and only pasted the text, which included a warning message saying that styles weren't supported yet. But eventually I too tried doing stuff like comparing the original content to the new content, finding the differences and sanitizing the differences.

But as you said, it's tricky, messy, and often results in weird behavior. The problem with this especially is that its often a lot safer to simply deny any format pasting because a single tag with wrong formatting can mess up the entire page, not just the contenteditable part.

Anyway, not much help from me, just a recognition of the difficulty of this part. I am about to dive into mercury and see if I can swap out my half-ass implementation in my app for this one and hopefully I can contribute to mercury in the future.

Great job so far!

jejacks0n commented 13 years ago

This is totally helpful.. because you're confirming that the nature of what we're trying to do is not good, but there doesn't seem to be a different way to do it.. this helps me feel less crappy about it. Thanks for the feedback.. I'm still trying to fix issues with it.

If it works well but slow it's not too bad, but right now it's performing poorly and not doing a very good job with it.

JeanMertz commented 13 years ago

I wonder how the "old" editors (tinyMCE, etc) used to solve this. I remember them having a "paste from word" button, but I am not sure if it was something special or simply a textfield which only accepts texts and completely strips out any styling.

I'm currently going over your codebase and it seems like a really clean and tidy project. Well done. It's too bad the pasting doesn't work as expected yet, I think it's a problem a lot of us would like to see solved, especially since non-technical users expect word-like behavior without any mess showing up on their screen.

offtopic: while I'm at it, I saw your editor showing up on Reddit and read some comments. I was wondering, does the gh-pages page use the latest version of the editor? And if so, is image uploading disabled because it lacks a backend? It would be nice if you added an image inside the content editable field, I wanted to test your "double click on image" functionality, but I couldn't.

JeanMertz commented 13 years ago

Just wanted to point out that I started using the editor and am liking it so far. However, I just turned on the paste sanitization and I have to say it's not working quite well.

Something as simple as copying "test content" results into this: span class="Apple-style-span" style="font-size: 14px; line-height: 17px; ">test content< (with content still being bold). Not exactly what I was expecting.

I just introduced a third option to the config file, clearStylesOnPaste: "nohtml". That way you can opt to remove ALL styling from the pasted content, simply by capturing the paste event and only paste plain text like (unfortunately you can't capture the html, which would've solved this issue):

      if Mercury.config.cleanStylesOnPaste == "nohtml"
        clipboard = event.originalEvent.clipboardData.getData "text/plain"
        @execCommand('insertHTML', {value: clipboard})
        return false

I've made a few other changes:

I'll see about doing a pull request sometime today, if you are interested in any of these changes, or you can implement them yourself in a proper way (haha), they are all really small modifications.

jejacks0n commented 13 years ago

Thanks man, and yes, I totally agree, it's broken. =) I've been looking into this the past couple evenings, and I feel I need to drop what I currently have and try a new approach. I'm sort of spiking on some more elegant solutions.

Since you've done some work already, I'd be happy to take pull requests.. the other things you mention seem pretty good as well. If you're not setup to do pull requests, just drop me the lines of code you changed and I'll integrate them into the coffeescript and you'll get authorship of the commit.


So, after researching this all over again, I've come to a few new conclusions.. the first is using the clipboard data.. Chrome allows this, as does Safari, and I still need to check that it's not just because I'm on localhost, though I doubt that matters. Firefox however will need to do something sort of like what it currently does, but an improved version (cause the current is clearly buggy).

event.originalEvent.clipboardData.getData("text/plain") will make for a more elegant solution for the MS Word pastes, and is supported in both Safari and Chrome.

event.originalEvent.clipboardData.getData("text/html") will work in Chrome, which will simplify that soooo much (how it should be.), and this does sort of work in Safari. So, when you copy something from Safari, and try and paste it, there's no text/html defined in the clipboardData, however, if you copy from Chrome, Firefox, and others that set text/html properly on copy it is available. Alternatively, if you copy from Safari and paste into Chrome, the html is wrapped within a whole document.

In either case, it seems cleaning that part up to get the pasted content is a more reliable thing to do in those circumstances. But there will need to be a fallback for those browsers that have buggy / incomplete support. And that's where I'm a little unsure on the best way to do it. I was explaining this to a friend and we think it may not be fully possible.

So, I've been working on the fallback (current code with some enhancements I've added) and it works by taking the original content, and comparing it to the content that's there after the paste. It tries to determine the inserted content, but there's some problems...

Let's say I have the content "abc123", and paste "bat" into that text after the "2".. we end up with "abc12bat3".. we can detect this and get the difference correct every time.. but what if I paste it after the "a".. "abatc123", we'll end up thinking that the pasted content was "atc", because the b matches the original string, and we're using the length of the content with the additions - the original content length. When you get into it, you realize that trying to detect the difference there can't ever be 100% accurate. Let's look at some more complex situations:

original content ([] denotes selection) : pasted content -> calculated result a[bc1]23 : bc4 -> 4 ab[c12]3 : cz2 -> z23

As far as the fallback goes, I'm sort of at a loss in making it 100% issue free.

JeanMertz commented 13 years ago

Great find on the event.originalEvent.clipboardData.getData("text/html") part. I never got it to work in Safari so I somehow assumed (even though I did a lot of research on this 6 months ago) that it wasn't going to work in any browser. I think the mozilla developer docs might help you out more.

About detecting the difference on pasting, you are absolutely right that it won't work in every case and is far from perfect. I just tried something else, that other people dismissed as not possible or only possible on keyboard shortcut pasting, not context-menu pasting, which is moving the cursor position to an empty (positioned off the screen) div, and paste the content there when a user pastes some content, which would allow you more easily to clean things up.

I tried it in Safari, and it seems to work for me, so I am not sure why people say it isn't possible, am I missing something here? And even if it doesn't work in Chrome, Firefox and IE, we can still use it in Safari to solve the problem, and use the getData('text/html') solution for Chrome, that's two down, one to go.

Just try it out, copy the content of this comment and paste it here: http://jsfiddle.net/tCgV9/.

You could do the following actions to make this work:

* detect the paste event on all content editable elements

see my next comment for a solution

I realize you want to keep the project as much cross-browser as possible, but truth of the matter is that this isn't always possible, and I think this is such a situation.

I would recommend going with the getData('text/html') route for the browsers that do support it (Webkit I guess, though somehow Safari doesn't work correctly), and falling back to getData('text/plain') or the solution I proposed above for other browsers.

JeanMertz commented 13 years ago

I just updated the jsfiddle snippet to a fully working example:

http://jsfiddle.net/cgNJJ/

It now works (almost) as expected and I confirmed it working in both Chrome and Safari.

(I don't see any reason why it wouldn't work in Firefox, and if it doesn't, it shouldn't be too hard to get it working)

Currently this method breaks the undo command, but since you already have custom undo/redo functions, I am sure you can get that to work with this as well.

It obviously needs a lot of cleaning up, and refinement, but it seems to work great so far.

Any thoughts?

just in case jsfiddle suddenly dies on us, here's the relevant javascript:

$('.mercury-region').bind 'paste', ->

  # get current selection & range
  window.targetElement = this
  window.selection = window.getSelection()
  window.range = selection.getRangeAt(0)

  # set focus to hidden div
  $('#hidden_sanitize_div').html('').focus()

$('#hidden_sanitize_div').bind 'focus', ->

  sanitizeDiv = this

  # set a 1ms timeout to allow the paste event to execute
  setTimeout(->

    selection = window.selection
    range = window.range

    # sanitize the pasted html
    $(sanitizeDiv).find('[style]').attr({style: null})
    $(sanitizeDiv).find('[class]').attr({class: null})
    $(sanitizeDiv).find('[id]').attr({id: null})

    # move cursor back to original element & position
    $(window.targetElement).focus()
    selection.removeAllRanges()
    selection.addRange(range)

    # delete any selected content
    range.deleteContents()

    # paste new content
    sanitizedHtml = document.createDocumentFragment()
    while (node = sanitizeDiv.firstChild) {
      lastNode = sanitizedHtml.appendChild(node)

    range.insertNode(sanitizedHtml)

    # move cursor after the last pasted element
    if lastNode
      range = range.cloneRange()
      range.setStartAfter(lastNode)
      range.collapse(true)
      selection.removeAllRanges()
      selection.addRange(range)

  , 1)
jejacks0n commented 13 years ago

I'll need to check this out this weekend.. seems like a non-trivial task to give you an adequate response. =) Thanks so much for looking into this and helping out.

jejacks0n commented 13 years ago

Closing since it will be resolved shortly.

ghost commented 11 years ago

Closed, but Copy & Paste is not working on latest Firefox. Chrome is fine though.

adamjohnson commented 10 years ago

I can confirm that sanitization when pasting from Microsoft Word 2011 (Mac) isn't working in Chrome 35, Firefox 30, and Safari 7.0.4. Used the demo to confirm these results.