muchl / owaspantisamy

Automatically exported from code.google.com/p/owaspantisamy
0 stars 0 forks source link

CSS validation does not report CSSParseExceptions as errors #105

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a file that will fail to parse via org.apache.batik.css.parser.Parser 
(using browser hacks should do the trick, 
http://www.javascriptkit.com/dhtmltutors/csshacks3.shtml)
2. Run validation on a CSS file (AntiSamy.scan(...))

What is the expected output? What do you see instead?
-Parser will report CSSParseExceptions to the provided ErrorHandler.  However, 
CssScanner does not set an ErrorHandler for it to report to.

Expected:  CssScanner provides an ErrorHandler to Parser, and the errors 
reported to the handler are added to the CleanResults errorMessages list.

Actual:  No ErrorHandler is provided to the Parser, so the parse exceptions go 
unnoticed.

What version of the product are you using? On what operating system?
-AntiSamy version is 1.4
-OSs tried: XP, Vista, 7

Please provide any additional information below.

Browser hacks are just an example, but anything that will get Parser to throw a 
parse exception should report the errors that should be reported by AntiSamy's 
CleanResults.

Original issue reported on code.google.com by tad...@gmail.com on 29 Mar 2011 at 3:59

GoogleCodeExporter commented 8 years ago
It appears that HTML parsing exceptions are not being added to the error list 
either, for the same reason (no error handler provided).  Would you like me to 
update this issue, or create a new one?

Original comment by tad...@gmail.com on 4 Apr 2011 at 4:01

GoogleCodeExporter commented 8 years ago
We're not really interested in parsing exceptions - only content that is not 
allowed. Unparsed content errors don't really fall into that, although I can 
see why you might want them.

People have been using the number of error messages as an indication of whether 
the data is safe. This is not what I intended with the API but parser errors 
will complicate that (unfortunate) common usage.

Original comment by arshan.d...@gmail.com on 7 Jun 2011 at 5:25

GoogleCodeExporter commented 8 years ago
Arshan,

I can see where you are coming from.  I understand that AntiSamy is a scrubber, 
and not a validator.  And I know that we had been using the number of errors as 
an indication whether or not the data is safe (and thus needed the addition of 
the parsing exceptions).  I know this is wrong, and we are changing.

However, if we take the clean, and some of the input was removed due to parsing 
exceptions, I don't see a good way to notify our users that part of their input 
was removed.  I know that we cannot be the first users to run into this issue, 
so any thoughts on the matter would be appreciated.

An example so people will know what I am talking about.  A user inputs "<a 
href="http://www.google.com>Google</a>", forgetting the end quote on the href 
(an honest mistake).  The "cleaned" value is an empty string, but no errors are 
reported.  If this were among a block of HTML, it is possible that the user 
would overlook it when comparing the output with their input.  We have no way 
of indicating to the user that something was removed.  Other than just putting 
a generic "Please make sure the cleaned output is as expected" type message, 
I'm not sure how to make the users aware that they might have lost something.

I've considered validating the HTML prior, but without using the same parser 
that AntiSamy uses, the results could be different.  Using the same parser that 
AntiSamy uses would run into the same issues that we are seeing now with our 
custom code (I can go into details for anyone interested).

Anyhow, I welcome any thoughts or ideas on how to handle this without upsetting 
our users too much.

Thanks

-Troy

Original comment by tad...@gmail.com on 15 Jul 2011 at 8:01

GoogleCodeExporter commented 8 years ago
We could separate the two; on top of today's error messages (which are related 
to security) we could have "formatting" messages, which could be used to help 
users tweak their input. If there's a good spec (or better yet, a patch), this 
could make into the release after the forthcoming one.

Original comment by arshan.d...@gmail.com on 15 Sep 2011 at 8:19