Closed GoogleCodeExporter closed 9 years ago
Figured out a fix, look for this in the new version.
Original comment by arshan.d...@gmail.com
on 26 Nov 2008 at 5:53
Fixed!
Original comment by arshan.d...@gmail.com
on 26 Nov 2008 at 6:34
I still have the same problem in 1.3. Here is the test case:
@Test
public void testScanCDATA() {
System.out.println("ScanCDATA");
String taintedHTML = "<![CDATA[*#autos.tabContent {display: block;}]]>";
CssScanner scanner = new CssScanner(policy, null);
try {
scanner.scanStyleSheet(taintedHTML, 1000);
}
catch (Exception ex) {
Assert.fail(ex.getMessage());
}
}
And the error message:
Caused by: org.apache.batik.css.parser.ParseException: character
at org.apache.batik.css.parser.Scanner.nextToken(Unknown Source)
at org.apache.batik.css.parser.Scanner.next(Unknown Source)
at org.apache.batik.css.parser.Parser.parseStyleSheet(Unknown Source)
at org.owasp.validator.css.CssScanner.scanStyleSheet(CssScanner.java:131)
at
org.owasp.validator.html.scan.AntiSamyDOMScanner.recursiveValidateTag(AntiSamyDO
MScanner.java:429)
I looked into org.apache.batik.css.parser.Scanner and it recognizes <!-- but
not <![.
May I please know what was the fix?
Original comment by ewuj...@gmail.com
on 25 Mar 2009 at 9:20
This is interesting! We never anticipated someone using the CssScanner class
directly. The fix was put into the AntiSamyDOMScanner class where the CssScanner
class is normally invoked.
However, to help you (anyone else using it this way) we are going to move the
fix
into CssScanner.
Original comment by arshan.d...@gmail.com
on 25 Mar 2009 at 5:48
Thanks for your quick response. I read the code in AntiSamyDOMScanner and there
are
two minor issues here:
Line 420, '.' is used in the pattern and therefore it only works in single-line
mode.
I would suggest to use '\s\S' instead. The original string is
"<![CDATA[*#autos.tabContent {\ndisplay: block;\n}]]>". That's why it gets
through
the check in AntiSamyDOMScanner and reaches CssScanner.
Line 423, you possibly mean "isCdata = true;". Otherwise, CDATA is removed
after the
scan.
Original comment by ewuj...@gmail.com
on 26 Mar 2009 at 8:07
By the way, I really like the library and wish you guys do well.
Original comment by ewuj...@gmail.com
on 26 Mar 2009 at 8:39
Here's a patch for the AntiSamyDOMScanner in 1.3. I didn't fix the CssScanner.
This should allow multiline CDATA blocks in a style tag.
Enjoy ...
Original comment by dob...@gmail.com
on 20 Apr 2009 at 7:34
Attachments:
Sorry the last patch was buggy, here's a better one.
Change: don't attempt to re-wrap the CDATA block as CDATA again, because the
HTML
writer does this automaticlly, and it become eminently necessary if we've got
the
CDATA markup inside the string, resulting in the CDATA wrapper being added
twice.
Not sure of a better solution so I just let the HTML writer code decide whether
the
CDATA is really needed or not.
Original comment by dob...@gmail.com
on 20 Apr 2009 at 8:47
Attachments:
Ok, I've put a lot of thought into this.
First off, I've moved all the code to CssScanner.java where it belongs. This is
usually Jason's area, so hopefully he can sanity check my changes.
AntiSamy now considers your intentions of using XHTML serialization or not,
since
this affects behavior. Strict XHTML requires stylesheet data to be in CDATA
blocks -
and with that in mind, here is the decision tree:
If using XHTML:
- If CDATA is found
- Strip out the data (XHTMLSerializer will re-add it)
- If CDATA is not found
- Analyze and return as is (XHTMLSerializer will add it)
If not using XHTML:
- If CDATA is found
- Strip out the data
- Manually add CDATA blocks around cleaned CSS data
- If CDATA is not found
- Return cleaned data after scanning as is
I detect CDATA intent with dobesv's regular expression, which works as
advertised
across lines. I hope my fix that I'm committing makes everyone happy, but I'm
not
sure if user intent in this area can be cleanly assumed by only using the XHTML
policy directive.
I would appreciate feedback in this area - I am committing this patch tonight
so you
can test the current branch by the morning.
Original comment by arshan.d...@gmail.com
on 3 Aug 2009 at 5:40
Feel free to re-open if it doesn't make you happy.
Original comment by arshan.d...@gmail.com
on 3 Aug 2009 at 6:23
You mentioned that you changed the file in your last comment, where is this
available? We are having a similar issue. The scanner appears to be inserting a
cdata
tag but when we send the html back through it again we are getting the same
error
org.apache.batik.css.parser.ParseException: character. Thx.
Original comment by waseul.i...@gmail.com
on 27 Feb 2010 at 11:58
Is there a way to prevent the XHTMLSerializer from adding the CDATA in XHTML
mode? Maybe a directive or switch...
We are using FCK Editor and he seems to be having problems when reading the
CDATA section on edition so I'd prefer to leave the css intact...
Thank you.
Original comment by fpoliq...@gmail.com
on 8 Dec 2010 at 8:24
I suppose one could subclass the org.apache.xml.serialize.XHTMLSerializer class
to act on CDATA according to a switch - but I'm not sure how easy that will be
from a SAX approach. I will look at any patches submitted, though.
Original comment by arshan.d...@gmail.com
on 14 Dec 2010 at 3:16
Original issue reported on code.google.com by
li.jaso...@gmail.com
on 24 Nov 2008 at 9:47