Preserve comments - Githubissues

GoogleCodeExporter commented 9 years ago

I would like to have AntiSamy preserve HTML comments in the original 
source, because the template language I am using (MiniTemplator) uses 
comments for its directives.

Is this something that would be easy, or it is a monumental task?

I suppose a worst-case I could escape the comments temporarily beforehand 
and unescape them after running AntiSamy.

Original issue reported on code.google.com by dob...@gmail.com on 17 Apr 2009 at 4:46

GoogleCodeExporter commented 9 years ago

It would also be best to preserve whitespace as well, since whitespace can 
affect 
html layouts (we are allowing "pre" formatted text).

Original comment by dob...@gmail.com on 17 Apr 2009 at 4:53

GoogleCodeExporter commented 9 years ago

Here's a patch to allow preserving whitespace and comments, using directives in 
the 
policy.xml:

{{{
<directive name="formatOutput" value="false"/>
<directive name="omitComments" value="false"/>
<directive name="preserveSpace" value="true"/>
}}}

The patch applies to the antisamy-1.3 svn folder.

Original comment by dob...@gmail.com on 20 Apr 2009 at 7:25

Attachments:

preserve-whitespace-comments.patch.txt

GoogleCodeExporter commented 9 years ago

The only problem with allow comments is that it is possible to execute code 
within
them. See http://ha.ckers.org/xss.html.

Yes, the whitespace is my most personally annoying issue. I can't wait to look 
at
your patch to see if we can integrate!

Original comment by arshan.d...@gmail.com on 11 Jun 2009 at 1:21

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Original comment by arshan.d...@gmail.com on 11 Jun 2009 at 1:21

GoogleCodeExporter commented 9 years ago

In my case I need the comments, I can possibly strip them out later using a 
Regex.  I  
didn't quite find the case where comments were used for an attack in the link 
you 
posted, was it just the one where they are injecting a "-->" to close the 
comment and 
stick in a script tag?

Original comment by dob...@gmail.com on 11 Jun 2009 at 4:05

GoogleCodeExporter commented 9 years ago

Sorry, it wasn't in that link. See issue #5. That wraps a link tag inside of a
conditional comment.

Original comment by arshan.d...@gmail.com on 3 Aug 2009 at 2:35

Changed state: WontFix
Added labels: Priority-Low
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

Ah I see, well is there a way to remove conditional comments without ditching 
the 
regular comments?

Original comment by dob...@gmail.com on 3 Aug 2009 at 5:14

GoogleCodeExporter commented 9 years ago

Not easily, unfortunately - conditional comments with nested code are reported 
as one
big comment by NekoHTML. You could uncomment the comment-removing code, but 
you're
possibly exposing yourself. Could you move AntiSamy downstream of the 
templating system?

Original comment by arshan.d...@gmail.com on 3 Aug 2009 at 5:18

GoogleCodeExporter commented 9 years ago

Hi arshan,

I don't want to run AntiSamy every time I process a template, it's too 
expensive - I 
just use it when they change the template to remove anything unwanted from the 
template itself.

I can easily add a bit of regex to strip comments downstream of the templating 
system.

Or I can try and make a version of the patch that strips conditional comments, 
it 
seems pretty doable.

Original comment by dob...@gmail.com on 3 Aug 2009 at 5:57

GoogleCodeExporter commented 9 years ago

I don't want to run AntiSamy every time I render a template, it'd be too 
expensive.  
I'll adjust the patch to remove any IE conditional directives found inside the 
comment 
if comments are preserved.

Original comment by dob...@gmail.com on 3 Aug 2009 at 6:06

GoogleCodeExporter commented 9 years ago

Here's a new patch to strip the conditional directives without stripping 
comments or 
whitespace.

Original comment by dob...@gmail.com on 3 Aug 2009 at 6:31

Attachments:

preserve-whitespace-comments-v2.patch.txt

GoogleCodeExporter commented 9 years ago

I'm integrating the patch. Can you generate test cases?

Original comment by arshan.d...@gmail.com on 3 Aug 2009 at 7:15

GoogleCodeExporter commented 9 years ago

Hi there,

I thought I had included test cases in the patch, for example:

    /**
     * Issue #41 - support preservation of comments and whitespace, with the caveat 
that
     * IE conditional directives must be stripped for security purposes.
     */
    public void testCommentsAndWhitespace() throws Exception {
        assertEquals( "text", as.scan("text <!-- comment --
>",policy).getCleanHTML());
        policy.setDirective(Policy.OMIT_COMMENTS, "false");
        policy.setDirective(Policy.PRESERVE_SPACE, "true");
        policy.setDirective(Policy.FORMAT_OUTPUT, "false");

        assertEquals( "<div>text <!-- comment --></div>", as.scan("<div>text <!-- 
comment --></div>",policy).getCleanHTML());
        assertEquals( "<div>text <!-- comment --></div>", as.scan("<div>text <!--[if 
IE]> comment <[endif]--></div>",policy).getCleanHTML());
        // nested if's.  Not handled very well, but as long as there's no tag 
injection we're satisfied
        assertEquals( "<div>text <!-- <!-- comment --><[endif]--></div>", 
as.scan("<div>text <!--[if IE]> <!--[if gte 6]> comment <[endif]--><[endif]--
></div>",policy).getCleanHTML());
        // comment inside an if.  Not handled very well, but as long as there's no 
tag injection we're satisfied
        assertEquals( "<div>text <!-- <!-- IE specific --> comment <[endif]--
></div>", as.scan("<div>text <!--[if IE]> <!-- IE specific --> comment 
<[endif]--
></div>",policy).getCleanHTML());
        assertEquals( "<div>text <!-- \ncomment --></div>", as.scan("<div>text <!-- [ 
if lte 6 ]>\ncomment <[ endif\n]--></div>",policy).getCleanHTML());
        assertEquals( "<div>text  comment </div>", as.scan("<div>text <![if !IE]> 
comment <![endif]></div>",policy).getCleanHTML());
        assertEquals( "<div>text  comment </div>", as.scan("<div>text <![ if !IE]> 
comment <![endif]></div>",policy).getCleanHTML());
    }

Original comment by dob...@gmail.com on 3 Aug 2009 at 7:48

GoogleCodeExporter commented 9 years ago

However, it looks like I may have accidentally included a couple of other fixes 
in that 
patch, like the CDATA fix and a fix for the z-index test case.  Here's a 
cleaned-up 
version with just the whitespace + comments related code.

Original comment by dob...@gmail.com on 3 Aug 2009 at 7:52

Attachments:

preserve-whitespace-comments-v2.patch.txt

GoogleCodeExporter commented 9 years ago

Your defense is interesting but I'm not confident injection isn't possible. 
Consider
the following test case which shows how an attacker can still input conditional
comments through fragmentation:

String attack = "<![if lte 8]alert(1)<![endif]>";
String spacer = "<![if IE]>";

StringBuffer sb = new StringBuffer();

sb.append("<div>text<!--");

for(int i=0;i<attack.length();i++) {
  sb.append(attack.charAt(i));
  sb.append(spacer);
}

sb.append("-->");

String s = sb.toString();
CleanResults cr = as.scan(s,policy);
System.out.println( cr.getCleanHTML() );
assertTrue( cr.getCleanHTML().indexOf("if") == -11 );

This takes advantage of the "remove the bad stuff inside the string" approach, 
and
fragments an attack using what you'll rip out as a spacer. However there is no 
tag in
this approach (tags seem to get reliably encoded). So aside from tags, does IE 
honor
any other kind of active content inside conditional comments? If it sees 
JavaScript,
will it execute? I have to investigate this a little.

Original comment by arshan.d...@gmail.com on 3 Aug 2009 at 8:26

GoogleCodeExporter commented 9 years ago

I think your patch is fine, actually. After some testing, I can't get IE to 
recognize
any active content inside the conditional comment. On top of that, I can't get 
HTML
control characters to come out decoded/raw in the resulting getCleanHTML().

Original comment by arshan.d...@gmail.com on 4 Aug 2009 at 1:23

GoogleCodeExporter commented 9 years ago

Hi Arshan,

Good catch.  After some experimentation I realized that replaceAll() doesn't 
make 
sense here - the IE parser only seems to accept the condition if it is right at 
the 
start of the comment with no preceding characters or whitespace.

For example, the following alerts 1 and 5 only:

<html>
<body>
<div>text<!--[if IE]><script>alert(1)</script><![endif]--></div>
<div>text<!--[if IE] ><script>alert(2)</script><![endif]--></div>
<div>text<!--[ if IE]><script>alert(3)</script><![endif]--></div>
<div>text<!-- [if IE]><script>alert(4)</script><![endif]--></div>
<div>text<!--[if IE]><script>alert(5)</script>< ![endif]--></div>
</body>
</html>

For this reason, I think the following algorithm would work, and would not be 
subject 
to the fragmentation attack:

        if ( node instanceof Comment ) {
            final String omitComments = 
policy.getDirective(Policy.OMIT_COMMENTS);
            // Strip conditional directives (for IE only) regardless of the 
OMIT_COMMENTS setting
            if(omitComments == null || "true".equals(omitComments) || 
((Comment)node).getData().startsWith("[if ")) {
                node.getParentNode().removeChild(node);
            }
            return;
        }

If you like I can create and submit a new patch with this algorithm, or you can 
paste 
that into the appropriate place at the top of recursiveValidateTag(Node node).

Here's the new test case:

    public void testCommentsAndWhitespace() throws Exception {
        assertEquals( "text", as.scan("text <!-- comment --
>",policy).getCleanHTML());
        policy.setDirective(Policy.OMIT_COMMENTS, "false");
        policy.setDirective(Policy.PRESERVE_SPACE, "true");
        policy.setDirective(Policy.FORMAT_OUTPUT, "false");

        assertEquals( "<div>text <!-- comment --></div>", as.scan("<div>text <!-- 
comment --></div>",policy).getCleanHTML());
        assertEquals( "<div>text </div>", as.scan("<div>text <!--[if IE]> comment 
<![endif]--></div>",policy).getCleanHTML());
        // nested if's.  Not handled very well, but as long as there's no tag 
injection we're satisfied
        assertEquals( "<div>text </div>", as.scan("<div>text <!--[if IE]> <!--[if gte 
6]> comment <![endif]--><![endif]--></div>",policy).getCleanHTML());
        // comment inside an if.  Not handled very well, but as long as there's no 
tag injection we're satisfied
        assertEquals( "<div>text  comment </div>", as.scan("<div>text <!--[if IE]> 
<!-- IE specific --> comment <![endif]--></div>",policy).getCleanHTML());

        assertEquals( "<div>text  comment </div>", as.scan("<div>text <![if !IE]> 
comment <![endif]></div>",policy).getCleanHTML());
    }

Original comment by dob...@gmail.com on 4 Aug 2009 at 3:07

GoogleCodeExporter commented 9 years ago

Hi Arshan,

Actually I did confirm your fragmentation attack, if your attack string in the 
code 
above is changed to "[if IE]<script>alert(1)</script><![endif]" then it outputs 
HTML 
containing a conditional comment with a script tag in it:

<!--[if IE]<script>alert(1)</script><![endif]-->

This isn't "clean" enough ... however, the new algorithm works better.

Original comment by dob...@gmail.com on 4 Aug 2009 at 3:11

GoogleCodeExporter commented 9 years ago

I have IE8 and can confirm that only 1 & 5 fire. I wonder if other browsers have
quirks modes that will allow them to execute these. This might work today, but 
what
about tomorrow?

Original comment by arshan.d...@gmail.com on 4 Aug 2009 at 3:14

GoogleCodeExporter commented 9 years ago

It seems like an implementation that allows the "[if" to occur at any point in 
the 
comment would actually be more work for both the programmers and the computers. 
 We 
could adjust the test to allow whitespace before "[if" but honestly I would be 
very 
surprised if the IE team changed their system to allow whitespace there.  
Additionally processing conditions ANYWHERE in the comments seems even more 
absurd - 
what if someone wrote a comment to explain the use of the conditional comment 
and 
accidentally tested the condition instead?

There are all sorts of odd things the IE team can do in the future, and 
antisamy can 
continue to default to omitting comments anyway, and have a little warning in 
the 
docs for those who choose to leave comments in that "some unknown attacks 
involving 
IE conditional comments may be possible or become possible with this enabled".

Original comment by dob...@gmail.com on 4 Aug 2009 at 3:35

GoogleCodeExporter commented 9 years ago

Is the above mentioned patch for allowing comments available? It doesn't appear 
to 
be in the 1.3 distribution I have. Is there a later release?

Original comment by FrankPap...@gmail.com on 9 Nov 2009 at 4:36

GoogleCodeExporter commented 9 years ago

Forgive me if I'm missing something obvious here...

I cannot get comments to be output, even with the preserveComments directive.  
Upon further investigation, the BaseMarkupSerializer seems to ignore the 
comment (basically it writes it to the preRoot, but because it's the fragment 
that is being serialized, it never writes the preRoot's contents).  Can someone 
provide an example of HTML where the comments are being preserved (assuming the 
directive has been set)?

Original comment by tad...@gmail.com on 29 Feb 2012 at 4:52

rjatkins / owaspantisamy

Preserve comments #41