p4ulypops / jquery-clean

Automatically exported from code.google.com/p/jquery-clean
0 stars 0 forks source link

XHTML from HTML documents #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Just parse some HTML with an IMG tag (HTML style)

What is the expected output? What do you see instead?
I'd expect to see the Image tag intact <img> but I see <img /> even though this 
is an HTML 
document.

What version of the product are you using? On what operating system?
The latest.

Please provide any additional information below.
Perhaps there could be an option regarding what the document type is. Or just 
pick it up from 
the header?

Original issue reported on code.google.com by brendon%...@gtempaccount.com on 1 May 2010 at 12:25

GoogleCodeExporter commented 9 years ago
Hi Brendon,

This plug-in only produces XHTML
Sorry, I'll make that clear on the home page

Ant

Original comment by antixsof...@gmail.com on 1 May 2010 at 12:34

GoogleCodeExporter commented 9 years ago
Thanks Ant,

Is that because it'd be a lot of work to do? Adding a config parameter such as 
xhtml = true or something like to 
help determine whether to self-close the tag or not wouldn't be that hard. But 
are you also doing other more 
subtle cleaning that has differences when it comes to XHTML vs HTML.

Are you open to patches from others? I wouldn't mind having a crack. Just let 
me know if I'm being too simplistic 
with my approach :)

Original comment by brendon%...@gtempaccount.com on 2 May 2010 at 11:56

GoogleCodeExporter commented 9 years ago
Hi Brendon,

I'm very happy for you to give it a go.

I think adding full support for HTML would lead to more work that it would be 
worth.

Why not tackle it by having a flag for closing self closers, which would 
default to
true... and similar for any other HTML type cleaning that you need.

Please make sure you do some tests and that any changes you make pass the old 
ones too.

When your done and smoothed all the wrinkles let me have it an I will integrate 
it.

Ant

Original comment by antixsof...@gmail.com on 3 May 2010 at 11:27

GoogleCodeExporter commented 9 years ago
Thanks Ant :)

I'll have a go at it soon :)

Apart from the self closers, are there any other XHTML specific things going 
on? I noticed that hspace and 
vspace gets removed from images. While I'm pretty sure the HTML strict spec 
doesn't allow them either, I'm 
just wondering if you've tailored any details to the XHTML spec that I should 
know about :)

Cheers,

Brendon

Original comment by brendon%...@gtempaccount.com on 4 May 2010 at 12:16

GoogleCodeExporter commented 9 years ago
Ok.

Here's some things off the top of my head
There is no provision for html short (min) tags, e.g. <li>I have no closing tag
tag/attr name case is changed to lower <LI> => <li>
tags must be nested <strong><em>Hiya</strong></em> => 
<strong><em>Hiya</em></strong>
attributes are quoted

By the way, why do you want this anyway? Ant

Original comment by antixsof...@gmail.com on 4 May 2010 at 6:31

GoogleCodeExporter commented 9 years ago
Hehe, thanks for that, none of those legitimate HTML options I would accept as 
a smart way of writing HTML 
anyway, so it doesn't bother me that those get cleaned up.

Don't laugh, but I'm playing around with the concept of an incontext inbrowser 
editor (rather than the type 
where you edit your content off in some abstract box and then hope for the 
best). I was looking for a quick 
and tested way to clean up the HTML before it's shown in the 'code view' and 
also at the end when the page is 
saved. Unfortunately I have to go back and forward on somethings (e.g. safari 
wants  tags and won't 
natively work with <strong> tags in some cases, so the code needs to be 
converted between standard and 
non-standard.

I tweaked some of the settings at the bottom of your code, and that gave me 
most of what I wanted (e.g. 
allowing style attributes on IMG tags for things like margins). I've turned 
your code off for now as I'm working 
on some simple rules to fix things like deprecated tags (hspace vspace etc...) 
before applying your extra level 
of sanitisation.

Really the only things I would need to change in the core of this code would be 
to allow self closing end tags 
to be HTML style, and to also allow the defaults at the bottom of the script to 
be altered through 
configuration.

It's a really interesting bit of code to read through though it really goes 
over my head a bit :D

Thanks for all your help and comments.

Brendon 

Original comment by brendon%...@gtempaccount.com on 4 May 2010 at 11:20

GoogleCodeExporter commented 9 years ago
My pleasure, and good luck on your project, Ant

Original comment by antixsof...@gmail.com on 5 May 2010 at 5:12

GoogleCodeExporter commented 9 years ago
Won't fix, see what Brendon comes up with for possible integration

Original comment by antixsof...@gmail.com on 1 Jul 2010 at 6:26