tcowans / owasp-java-html-sanitizer

Automatically exported from code.google.com/p/owasp-java-html-sanitizer
Other
1 stars 0 forks source link

Single and double quotes are being transformed #15

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I had hijacked another issue and was asked to create a new one :) After writing 
several tests, it's simpler than I though

What steps will reproduce the problem?
1. Pass an input string with a ' or " in it
2. Comes back escaped as ' or "

What is the expected output? What do you see instead?
I expect my input to come back with the ' or " in it.

What version of the product are you using? On what operating system?
Using version r164 on Mac mountain lion

Please provide any additional information below.
The code is quite basic:

HtmlPolicyBuilder builder = new HtmlPolicyBuilder();
PolicyFactory factory = builder.toFactory();
String sanitized = factory.sanitize(input);
return sanitized;

Original issue reported on code.google.com by jcmathe...@gmail.com on 24 Jun 2013 at 4:55

GoogleCodeExporter commented 9 years ago
It is probably unnecessary to escape these characters in HTML text nodes, 
though it is necessary in attribute bodies.

How is this causing problems though?

Original comment by mikesamuel@gmail.com on 24 Jun 2013 at 5:03

GoogleCodeExporter commented 9 years ago
It's causing problems because user-entered text is being returned to them in a 
format different than they entered. I've put in a hack to unescape the 
sanitized text which will fix this, but I'd like to not have to do that as this 
might have unforeseen consequences.

Original comment by jcmathe...@gmail.com on 24 Jun 2013 at 6:14

GoogleCodeExporter commented 9 years ago
jcmather21, Without more context, I really can't help.  Can you give an example 
of user-entered text and explain why a different, but semantically equivalent 
form is problematic?

If your hack involves replacing `"` with `"` and you white-list the "b" element 
and an attribute like "title" then you might have problems with inputs like

    <b title='foo " onmouseover="alert(1337)'>Foo

being sanitized to

    <b title="foo " onmouseover="alert(1337)">Foo

and then you might hack that back to

    <b title="foo " onmouseover="alert(1337)">Foo

which executes script when the user's mouse passes over the text.

Original comment by mikesamuel@gmail.com on 24 Jun 2013 at 6:43

GoogleCodeExporter commented 9 years ago
Yes sorry. I'm thinking outside the scope of html in the input string. For 
example, a sentence like this:
He said, "This is the best place ever!"

I want that to be returned exactly like that, but it comes back as: 
He said, "This is the best place ever!"

For my use case, I am taking user-entered description text and I want all html, 
script, css removed (we only support plaintext now). But I want regular text, 
left as is. So in your above example:
<b title='foo " onmouseover="alert(1337)'>Foo
I want back 
Foo

Original comment by jcmathe...@gmail.com on 24 Jun 2013 at 7:03

GoogleCodeExporter commented 9 years ago
The HTML sanitizer takes messy-unsafe HTML and gives back well-formed-safe HTML.

It sounds like you want to get back plain text -- the innerText or textContent 
-- without any tags at all.

If so, that's doable using the HTML sanitizer, but not using a method that is 
advertised as returning HTML.

If that's what you need, let me know and I can knock up some example code.

Original comment by mikesamuel@gmail.com on 24 Jun 2013 at 7:09

GoogleCodeExporter commented 9 years ago
Yes that does sound like what I need so example code would be awesome on how to 
accomplish this!

Original comment by jcmathe...@gmail.com on 24 Jun 2013 at 7:14

GoogleCodeExporter commented 9 years ago
If you have a policy builder called myPolicyBuilder, and a string of HTML 
called myHtml then

    StringBuilder sb = new StringBuilder();
    HtmlSanitizer.policy = myPolicyBuilder.build(new HtmlStreamEventReceiver() {
      public void openDocument() {}
      public void closeDocument() {}
      public void openTag(String elementName, List<String> attribs) {
        if ("br".equals(elementName)) { sb.append('\n'); }
      }
      public void closeTag(String elementName) {}
      public void text(String text) { sb.append(text); }
    });
    HtmlSanitizer.sanitize(myHtml, policy);
    // sb should now contain the plain text content of the page with <br> replaced by newlines.

Original comment by mikesamuel@gmail.com on 24 Jun 2013 at 7:43

GoogleCodeExporter commented 9 years ago
Sorry.  There were typos.  Try

    final StringBuilder sb = new StringBuilder();
    HtmlSanitizer.Policy policy = myPolicyBuilder.build(new HtmlStreamEventReceiver() {
      public void openDocument() {}
      public void closeDocument() {}
      public void openTag(String elementName, List<String> attribs) {
        if ("br".equals(elementName)) { sb.append('\n'); }
      }
      public void closeTag(String elementName) {}
      public void text(String text) { sb.append(text); }
    });
    HtmlSanitizer.sanitize(myHtml, policy);
    // sb should now contain the plain text content of the page with <br> replaced by newlines.

Original comment by mikesamuel@gmail.com on 24 Jun 2013 at 7:44

GoogleCodeExporter commented 9 years ago
I haven't yet gotten that to work for all of my use cases but thank you.

Original comment by jcmathe...@gmail.com on 25 Jun 2013 at 3:48

GoogleCodeExporter commented 9 years ago
A solution is to use StringEscapeUtils.unescapeHtml() from apache.commons 
before storing in database.

Original comment by rajkumar...@gmail.com on 5 Jul 2014 at 7:34

GoogleCodeExporter commented 9 years ago
StringEscapeUtils.unescapeHtml() is not a solution. A clever hacker can then 
use <script> to inject XSS.

Original comment by cantara...@gmail.com on 10 Sep 2014 at 9:58

GoogleCodeExporter commented 9 years ago
Cantata, please see comments 5&6.  The string is to be used as plain text, not 
HTML.

Original comment by mikesamuel@gmail.com on 11 Sep 2014 at 10:29