vishank848 / owasp-esapi-java

Automatically exported from code.google.com/p/owasp-esapi-java
Other
4 stars 2 forks source link

Please improve efficiency of string-building code #199

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Please change java.lang.Character to String conversion to be more efficient.  
Specifically, instead of ""+c to build string, use String.valueOf(c).  When I 
test on JDK 1.6.0_17, doing ""+c results in a new StringBuilder being 
unnecessarily created.

I noticed this, for example, in the HTMLEntityCodec.encodeCharacter(char[] 
immune, Character c).  Instead, it could be written as 

    public String encodeCharacter( char[] immune, Character c ) {

        // check for immune characters
        if ( containsCharacter(c, immune ) ) {
            return String.valueOf(c);
        }

        // check for alphanumeric characters
        String hex = Codec.getHexForNonAlphanumeric(c);
        if ( hex == null ) {
            return String.valueOf(c);
        }

        // check for illegal characters
        if ( ( c <= 0x1f && c != '\t' && c != '\n' && c != '\r' ) || ( c >= 0x7f && c <= 0x9f ) )
        {
            hex = REPLACEMENT_HEX;  // Let's entity encode this instead of returning it
            c = REPLACEMENT_CHAR;
        }

        // check if there's a defined entity
        String entityName = (String) characterToEntityMap.get(c);
        if (entityName != null) {
            return "&" + entityName + ";";
        }

        // return the hex entity as suggested in the spec
        return "&#x" + hex + ";";
    }

Along those lines, this could further be improved by changing the API of the 
encodeCharacter() so it could append to a provided StringBuilder.  This way new 
Strings/StringBuilders don't need to be built up as characters are encoded.  
Although this would be needed on all the codecs, I did enclose the suggested 
code change for the base Codec class and the HTMLEntityCodec class.

All these extra objects being created, among other things, will be more work 
for the garbage collector.  I wouldn't care too much if we only made occasional 
calls to it, but in the course of building up an HTML page, we could be making 
thousands of calls in to this method.  Now look at that page being rendered 
under load and we're quickly in to the millions of invocations.

Thanks for your consideration,

Eric

Original issue reported on code.google.com by ebatz...@gmail.com on 27 Jan 2011 at 12:16

Attachments:

GoogleCodeExporter commented 9 years ago
Actually, the code I attached is still doing more work than needed.  If the 
StringBuilder is passed in to the encodeCharacter method, then String.valueOf() 
is unnecessary and overkill.  The Character should be passed directly in to the 
StringBuilder.  For example:

    public StringBuilder encodeCharacterAndAppend(StringBuilder sb, char[] immune, Character c ) {

        // check for immune characters
        if ( containsCharacter(c, immune ) ) {
            return sb.append(c);
        }

        // check for alphanumeric characters
        String hex = Codec.getHexForNonAlphanumeric(c);
        if ( hex == null ) {
            return sb.append(c);
        }

I've uploaded the new HTMLEntityCode with this change.

Original comment by ebatz...@gmail.com on 27 Jan 2011 at 12:27

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by M.Gelma...@gmail.com on 13 Nov 2014 at 6:13