nutzam / nutz

Nutz -- Web Framework(Mvc/Ioc/Aop/Dao/Json) for ALL Java developer
https://nutzam.com
Apache License 2.0
2.53k stars 942 forks source link

Strings escapeHTML转义不全 没有 unescapeHTML #325

Closed gongrui closed 11 years ago

gongrui commented 12 years ago

加了个全的试试 ,TestCase也得修改,因为将空格转为了  

private static HashMap<String,String> htmlEntities;
    static {
      htmlEntities = new HashMap<String,String>();
      htmlEntities.put("&lt;","<")    ; htmlEntities.put("&gt;",">");
      htmlEntities.put("&amp;","&")   ; htmlEntities.put("&quot;","\"");
      htmlEntities.put("&agrave;","à"); htmlEntities.put("&Agrave;","À");
      htmlEntities.put("&acirc;","â") ; htmlEntities.put("&auml;","ä");
      htmlEntities.put("&Auml;","Ä")  ; htmlEntities.put("&Acirc;","Â");
      htmlEntities.put("&aring;","å") ; htmlEntities.put("&Aring;","Å");
      htmlEntities.put("&aelig;","æ") ; htmlEntities.put("&AElig;","Æ" );
      htmlEntities.put("&ccedil;","ç"); htmlEntities.put("&Ccedil;","Ç");
      htmlEntities.put("&eacute;","é"); htmlEntities.put("&Eacute;","É" );
      htmlEntities.put("&egrave;","è"); htmlEntities.put("&Egrave;","È");
      htmlEntities.put("&ecirc;","ê") ; htmlEntities.put("&Ecirc;","Ê");
      htmlEntities.put("&euml;","ë")  ; htmlEntities.put("&Euml;","Ë");
      htmlEntities.put("&iuml;","ï")  ; htmlEntities.put("&Iuml;","Ï");
      htmlEntities.put("&ocirc;","ô") ; htmlEntities.put("&Ocirc;","Ô");
      htmlEntities.put("&ouml;","ö")  ; htmlEntities.put("&Ouml;","Ö");
      htmlEntities.put("&oslash;","ø") ; htmlEntities.put("&Oslash;","Ø");
      htmlEntities.put("&szlig;","ß") ; htmlEntities.put("&ugrave;","ù");
      htmlEntities.put("&Ugrave;","Ù"); htmlEntities.put("&ucirc;","û");
      htmlEntities.put("&Ucirc;","Û") ; htmlEntities.put("&uuml;","ü");
      htmlEntities.put("&Uuml;","Ü")  ; htmlEntities.put("&nbsp;"," ");
      htmlEntities.put("&copy;","\u00a9");htmlEntities.put("&#x27;", "'");
      htmlEntities.put("&reg;","\u00ae");
      htmlEntities.put("&euro;","\u20a0");
    }
    /**
     * 将一个已经经过 HTMLEscape的字符串出现的还原,比如
     * 
     * <pre>
     *  unescapeHtml("&lt;script&gt;alert("hello world");&lt;/script&gt;") => "<script>alert("hello world")</script>"
     * </pre>
     * 
     * 
     * @param cs
     *            字符串
     * 
     * @return 转换后字符串
     */
    public static final String unescapeHTML(String source) {
        int i, j;

        boolean continueLoop;
        int skip = 0;
        do {
           continueLoop = false;
           i = source.indexOf("&", skip);
           if (i > -1) {
             j = source.indexOf(";", i);
             if (j > i) {
               String entityToLookFor = source.substring(i, j + 1);
               String value = (String) htmlEntities.get(entityToLookFor);
               if (value != null) {
                 source = source.substring(0, i)
                          + value + source.substring(j + 1);
                 continueLoop = true;
               }
               else if (value == null){
                  skip = i+1;
                  continueLoop = true;
               }
             }
           }
        } while (continueLoop);
        return source;
    }

    /**
     * 将一个字符串出现的HMTL元素进行转义,比如
     * 
     * <pre>
     *  escapeHtml("&lt;script&gt;alert("hello world");&lt;/script&gt;") => "&amp;lt;script&amp;gt;alert(&amp;quot;hello &nbsp;world&amp;quot;);&amp;lt;/script&amp;gt;"
     * </pre>
     * 
     * 转义字符对应如下
     * <ul>
     * <li>& => &amp;amp;
     * <li>< => &amp;lt;
     * <li>>=> &amp;gt;
     * <li>' => &amp;#x27;
     * <li>" => &amp;quot;
     * </ul>
     * 
     * @param cs
     *            字符串
     * 
     * @return 转换后字符串
     */
    public static String escapeHtml(CharSequence cs) {
        if (null == cs)
            return null;
        char[] cas = cs.toString().toCharArray();
        StringBuilder sb = new StringBuilder();
        boolean  finded = false;
        for (char c : cas) {
            finded = false;
            for(Object h:htmlEntities.keySet()){
              if(((String) htmlEntities.get(h)).equals(String.valueOf(c))){
                  finded=true;
                  sb.append(h.toString());
                  break;
              }
            }
            if(!finded){
             sb.append(c);
            }

        }
        return sb.toString();
    }
ywjno commented 12 years ago

这个unescapeHTML肯定不全,你可以参照这个HTML特殊字符编码大全或者List of symbols supported by Maruku,所以,写那么一个unescapeHTML感觉意义不大啊。。。

gongrui commented 12 years ago

呵呵呵,这个够全,

unescapeHTML是因为有个项目以前使用 filter 将表单数据先HTML转义再存入数据库,存储第一次没有问题,多次修改后就多次调用了 escape, 例如第一次是 < 转义为 & l t ;,第二次提交的数据会是 & l t ; 然后 &会被再次转义为& a m p ; 结果数据变成了 & a m p ; l t ;,当时没办法,只好每次先将数据 unescapeHTML,然后再调用 escapeHTML 不知道有什么其他方法

ywjno commented 11 years ago

其实我想到的简单的方法是,调用第三方有关html的类库直接把这些转移后的html字符给变成普通字符,可惜,貌似现在还没找到个比较好用的。。。

jsp的话倒是可以用<c:out value="${data}" escapeXml="true" />这个标签,数据库里面存的是<>这样的文字的话到画面上就自动变成&lt;&gt;

zozoh commented 11 years ago

有个 escapeHtml 勉强也就留着了, unescapeHTML 一直不知道为啥要支持这个方法 想对 HTML支持的更好,弄个 HTML 库哦,这个不应该是 Nutz 需要关心的事情 否则膨胀的没边了