p4ulypops / jquery-clean

Automatically exported from code.google.com/p/jquery-clean
0 stars 0 forks source link

Tags removed that shouldn't with certain chunks of HTML #11

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. The following HTML (which is the result of a paste from MS Word 2010 into 
IE9 -- yeah I know, the joy...):

<span lang="EN"><p dir="LTR" align="LEFT">This is a heading 1</p>
</span><font color="#345a8a" size="5" face="Calibri"><font color="#345a8a" 
size="5" face="Calibri"><font color="#345a8a" size="5" 
face="Calibri"></font></font></font><font color="#404040" size="2" 
face="Calibri"><font color="#404040" size="2" face="Calibri"><font 
color="#404040" size="2" face="Calibri"><p dir="LTR" align="LEFT">This is a 
heading 8</p>
</font></font></font><font face="Cambria"><p dir="LTR" align="LEFT">This <b>is 
<i>j<u>ust</u></i></b><i><u> so</u>me</i> text.</p>
<p dir="LTR" align="LEFT">And <i>a bullet list:</i></p><i>

<ul>
<p dir="LTR" align="LEFT"><li>Item 1</li><p></p>
<p dir="LTR" align="LEFT"><li>Item</li></ul></i><ul><li> 2 with a 
</li></ul></font><ul><li><a href="http://smart.pr/"><font face="Cambria"><span 
lang="EN">link</span></font></a></li><p></p></ul>

<font face="Courier New"><span lang="EN"><p dir="LTR" align="LEFT">And an 
image</p></span></font><p dir="LTR" align="LEFT"><font 
face="Cambria">:</font><a href="http://blog.smart.pr/"><img border="0" 
src="Image3.jpg" width="392" height="236"></a></p>

2. Will be cleaned into:

<span>This is a heading 1</span> This is a heading 8 This <strong>is 
<em>just</em></strong> <em>some</em> text. And <em>a bullet list:</em> <em>Item 
1 Item</em> 2 with a<ul><li><a 
href="http://smart.pr/"><span>link</span></a></li></ul> <span>And an 
image</span><p>:<a href="http://blog.smart.pr/"><img src="Image3.jpg" 
width="392" height="236" alt='' /></a></p>

What is the expected output? What do you see instead?

A whole bunch of tags is stripped that shouldn't be, like f.e. the first 'p'.

What version of the product are you using? On what operating system?

This 'over-cleaning' occurs in all browsers (although the only browser that 
ends up creating this obviously problematic HTML is IE9).

Please provide any additional information below.

Original issue reported on code.google.com by taw.mole...@gmail.com on 28 Jul 2011 at 1:22

GoogleCodeExporter commented 9 years ago
Btw, the given cleaned HTML is copy-pasted from the demo at 
http://www.antix.co.uk/Content/Demos/jQuery-htmlClean/Test.htm

Original comment by taw.mole...@gmail.com on 28 Jul 2011 at 1:23

GoogleCodeExporter commented 9 years ago
Hi Taw

Yes that looks wrong to me too
I'm am pretty snowed under at the moment and won't be able to have a look for a 
while

Is there any chance you can have a look yourself or hone it down to a simple 
bit of HTML which is causing the issue?

Original comment by antixsof...@gmail.com on 28 Jul 2011 at 9:04

GoogleCodeExporter commented 9 years ago
I might, as soon as I'm 100% sure that I'm gonna need jquery-htmlClean in my 
project. Needs more investigation. (The fact that it's 20K doesn't really help.)

Original comment by t...@timmolendijk.nl on 12 Aug 2011 at 1:23

GoogleCodeExporter commented 9 years ago
Hi Taw and Tim,

I've updated the code to pop the block level (non-inline) ones out of the 
inline ones.
This prevents the p being removed and the library now makes a good stab at the 
clean.

Its not perfect, but the html in the example is quite a mess and I think it 
makes a good job.

@Tim thanks for the note on size, I think it could be smaller, and a re-write 
might do it. But I am not going to do that any time soon. Compressed its only 
12k and bundled compressed and gzipping would help further

All the best now.

Original comment by antixsof...@gmail.com on 25 Feb 2013 at 3:22