decodeForHTML returns same character for Ù and ù

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. decodeForHTML returns same character for &Ugrave; and &ugrave;  This is true 
for all named entities with upper/lower case versions. 

What is the expected output? What do you see instead?

&Ugrave; should return upper case U with accent, and &ugrave; should return 
lower case u with accent.

What version of the product are you using? On what operating system?

Latest version on Linux.

Please provide any additional information below.

In HTMLEntityCodec.js, you should probably not do a case insensitive look-up at 
the end of the getNamedEntity function.

Thanks!

Original issue reported on code.google.com by wvinc...@gmail.com on 5 Aug 2012 at 9:19

GoogleCodeExporter commented 9 years ago

Hi,

I found one issue with decodeForHTML function. I tried below steps

org.owasp.esapi.ESAPI.initialize();

$ESAPI.encoder().encodeForHTML("<script>alert('123');</script>");
"<script>alert('123');</script>"

$ESAPI.encoder().decodeForHTML("<script>alert('123');</script>");
"<script>alert4039123394159<47script>"

Issue:- decodeForHTML is not giving me the actual data which i had encoded.

Solution:- In org.owasp.esapi.codecs.HTMLEntityCodec, the function parseNumber 
and parseHex returning number directly(return parseInt(out);). it should return 
char code(return String.fromCharCode(parseInt(out));).
Below are the function i have modified

var parseNumber = function(input) {
        var out = '';
        while (input.hasNext()) {
            var c = input.peek();
            if (c.match(/[0-9]/)) {
                out += c;
                input.next();
            } else if (c == ';') {
                input.next();
                break;
            } else {
                break;
            }
        }

        try {
            return String.fromCharCode(parseInt(out));
            //Commented to fix esapi bug
            //return parseInt(out);
        } catch (e) {
            return null;
        }
    };

    var parseHex = function(input) {
        var out = '';
        while (input.hasNext()) {
            var c = input.peek();
            if (c.match(/[0-9A-Fa-f]/)) {
                out += c;
                input.next();
            } else if (c == ';') {
                input.next();
                break;
            } else {
                break;
            }
        }
        try {
            return String.fromCharCode(parseInt(out, 16));
            //Commented to fix esapi bug
            //return parseInt(out, 16);
        } catch (e) {
            return null;
        }
    };

I have fixed this issue in esapi.js and using it for my project.

Thanks
Bikesh Kumar

Original comment by bikesh....@gmail.com on 19 Mar 2013 at 8:22

GoogleCodeExporter commented 9 years ago

I think all we did was change in HTMLEntityCodec.js

return String.fromCharCode(entityToCharacterMap.getCaseInsensitive('&' + 
entity));

to

return String.fromCharCode(entityToCharacterMap['&' + entity]);

Original comment by wvinc...@gmail.com on 19 Mar 2013 at 10:58

sillysachin / owasp-esapi-js

decodeForHTML returns same character for Ù and ù #11

sillysachin / owasp-esapi-js

decodeForHTML returns same character for &Ugrave; and &ugrave; #11

decodeForHTML returns same character for Ù and ù #11