escape special characters

xml

有很多细节, 比如 > 在属性中不转义也是可以的.

"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;

详见: https://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents

regexp

The following characters are the meta characters that give special meaning to the regular expression search syntax:

\ the backslash escape character. The backslash gives special meaning to the character following it. For example, the combination \n stands for the newline, one of the control characters. The combination "\w" stands for a "word" character, one of the convenience escape sequences while "\1" is one of the substitution special characters. Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol. Example: "^a" matches "a" at the start of the string. Example: "[^0-9]" matches any non digit. $ the dollar sign is the anchor for the end of the string. Example: "b$" matches a "b" at the end of a line. Example: "^$" matches the empty string. { } the opening and closing curly brackets are used as range quantifiers. Example: "a{2,3}" matches "aa" or "aaa". [ ] the opening and closing square brackets define a character class to match a single character. The ^ as the first character following the "[" negates, and the match is for the characters not listed. The "-" denotes a range of characters. Inside a "[ ]" character class construction, most special characters are interpreted as ordinary characters. Example: "[d-f]" is the same as "[def]" and matches "d", "e" or "f". Example: "[a-z]" matches any lower-case characters in the alphabet. Example: "[^0-9]" matches any character that is not an ASCII digit. Example: A search for "[][()?<>$^.?^]" in the string "[]()?<>$^.?^" followed by a replace string "r" has the result "rrrrrrrrrrrrr". Here the search string is one character class and all the meta characters are interpreted as ordinary characters without the need to escape them. ( ) the opening and closing parenthes3s are used for grouping characters (or other regexes). The groups can be referenced in both the search and the substitution phase. There also exist some special constructs with parentheses. Example: "(ab)\1" matches "abab". . the dot matches any character except the newline symbol. Example: ".a" matches two consecutive characters where the last one is "a". Example: "..txt$" matches all strings that end in ".txt". `the asterisk is the match-zero-or-more quantifier. Example: "^.*$" matches an entire line. +the plus sign is the match-one-or-more quantifier. ?the question mark is the match-zero-or-one quantifier. The question mark is also used in special constructs with parentheses and in changing match behaviour. |the vertical pipe separates a series of alternatives. Example: "(a|b|c)a" matches "aa" or "ba" or "ca". < >the smaller and greater signs are anchors that specify a left or right word boundary. -the minus sign indicates a range in a character class (when it is not at the first position after the[opening bracket or the last position before the "]" closing bracket. Example: "[A-Z]" matches any uppercase character. Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-". &` the ampersand is the "substitute complete match" symbol.

详见: http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html

java正则需要转义的字符

特别字符	说明
$	匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性，则 $ 也匹配 ‘\n' 或‘\r'。要匹配 $ 字符本身，请使用 \$。
( )	标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符，请使用 `$` 和 `$`。
*	匹配前面的子表达式零次或多次。要匹配字符，请使用 `\`。
+	匹配前面的子表达式一次或多次。要匹配 + 字符，请使用 `\+`。
.	匹配除换行符 \n之外的任何单字符。要匹配 .，请使用 `\.`。
[ ]	标记一个中括号表达式的开始。要匹配 [，请使用 `\[`。
\?	匹配前面的子表达式零次或一次，或指明一个非贪婪限定符。要匹配 ? 字符，请使用 `\?`。
\	将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。例如， ‘n' 匹配字符 ‘n'。'\n' 匹配换行符。序列 `\\` 匹配 `\`，而 `\(` 则匹配 `(`。
^	匹配输入字符串的开始位置，除非在方括号表达式中使用，此时它表示不接受该字符集合。要匹配 ^ 字符本身，请使用 `\^`。
{ }	标记限定符表达式的开始。要匹配 {，请使用 `\{`。
\|	指明两项之间的一个选择。要匹配 \|，请使用 `\\\|`。

uniquejava / blog

escape special characters #271

xml

regexp

java正则需要转义的字符