Open hunterli opened 7 years ago
@hunterli Will you contribute a pull request for it? Thanks!
@agentzh OK, I'll try it.
@hunterli Great! Looking forward to it!
I am afraid I don't think it is a bug.
According to the document, ngx.escape_uri
is used to:
Escape str as a URI component.
Although its name is escape_uri
.
Escape uri is different from escaping URI component.
The second one doesn't need to escape '
.
Refer to RFC 3986 section 2.2
:
A subset of the reserved characters (gen-delims) is used as delimiters of the generic URI components described in Section 3
Note that the RFC only requires the gen-delims part of reserved characters. Not all the reserved characters.
What urllib.quote
does is more like escaping uri instead of escaping URI component.
According to its document, urllib.quote
is used to:
By default, this function is intended for quoting the path section of the URL.
Don't mix up these two differnt things.
@spacewander FYI: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent said: To be more stringent in adhering to RFC 3986 (which reserves !, ', (, ), and *), even though these characters have no formalized URI delimiting uses, the following can be safely used
@hunterli
Could you point out where RFC 3986 says that !, ', (, ), and *
are reserved for URI component?
One place I found including !, ', (, ), and *
is the D.2. Modifications
, under Appendix D. Changes from RFC 2396
:
The mark characters that are typically unsafe to decode, including the exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open and close parentheses ("(" and ")"), have been moved to the reserved set in order to clarify the distinction between reserved and unreserved and, hopefully, to answer the most common question of scheme designers.
However, it just says '
is reserved compared with RFC 2396
, doesn't say it is reserved for URI component.
According to Appendix A. Collected ABNF for URI
, '
is one of the sub-delims
, which could be a part of query
or fragment
.
@spacewander I test url encode function in php and lua, both of them output %27. So may be compatible with other programming language to escape sub-delims
is a acceptable answer? (Sorry for poor English)
php -r "echo urlencode(\"test'test\");"
test%27test
FYI: http://lua-users.org/wiki/StringRecipesURL-encode a string
function urlencode(str) if (str) then str = string.gsub (str, "\n", "\r\n") str = string.gsub (str, "([^%w %-%%.%~])", function (c) return string.format ("%%%02X", string.byte(c)) end) str = string.gsub (str, " ", "+") end return str
end
Why not be compatible with Javascript? It doesn't escape '
, too. And it doesn't strictly follow the RFC 3986
, either. Do you like PHP more than Javascript? 😄
Another option is do what Go developers did: they just simply said:
we've accepted that we can't fully implement the RFC at this point for compatibility reasons.
Luckily, our previous implementation is more compatible with RFC 3986
. Why should we break the compatibility with RFC 3986
and break the compatibility with former code at the same time?
If your target environment requires special encode rule, just do it in your code.
@spacewander I "Try Go" in https://golang.org:
package main
import( "fmt" "net/url" )
func main() { fmt.Println(url.QueryEscape("'")) }
output:%27
@spacewander In RFC 3986 2.2 Reserved Characters:
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
For "&" and "'" both in sub-delims, ngx.escape_uri encode "&", why not "'"?
@hunterli
ngx.escape_uri
doesn't strictly follow the RFC 3986. However, escape other sub-delims
will make it less compatible with the RFC 3986 and also break the backward compability.被你弄糊涂了,Sorry,中文直接上了,看不懂就放弃了。也可能是我理解错误,RFC说"Reserved Characters"都应该escape,"Reserved Characters"包含gen-delims和sub-delims,这样的话gen-delims和sub-delims都应该escape,否则就是"白马非马"了。
RFC说"Reserved Characters"都应该escape
Could you point out where RFC 3986 says that all reserved characters should be escaped for URI component?
Refer to Appendix A. Collected ABNF for URI
, sub-delims
could be a valid part of some URI components.
URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.
这个是什么意思呀?通常的用户不是这样用么: http://example.com/query?test=ngx.escape("test=&'test") 如果"&","=","'"之类做为分隔符,就程序员自己放在ngx.escape函数外面吧?
Appendix A. 里面说:
query = *( pchar / "/" / "?" )
那么"&","="都不应该escape?能这样理解么?
http://example.com/query?test=value_part1=value_part2&value_part3=foo
test的值是:value_part1=value_part2&value_part3=foo
这样的URL是正确可解析的?很多WebServer应该都不这样理解吧。
Why does the openresty implementation not match nginx? Migrating from nginx can be confusing I won't discuss whose implementation follows rfc, but I think openresty should try to be consistent with nginx
For openresty:
For nginx:
$ python -c "import urllib; print urllib.quote(\"'\")" %27
$ resty -e "ngx.say(ngx.escape_uri(\"'\"))" ' ngx.escape_uri should escape "'" to %27 but not.
$resty -v resty 0.17 nginx version: openresty/1.11.2.3 built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) built with OpenSSL 1.0.2k 26 Jan 2017 TLS SNI support enabled configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.60 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.06 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.8 --add-module=../ngx_lua_upstream-0.06 --add-module=../headers-more-nginx-module-0.32 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.18 --add-module=../redis2-nginx-module-0.14 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-openssl=/tmp/openssl-1.0.2k --with-pcre=/tmp/pcre-8.39 --with-file-aio --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-http_xslt_module=dynamic --with-ipv6 --with-mail --with-mail_ssl_module --with-md5-asm --with-pcre-jit --with-sha1-asm --with-stream --with-stream_ssl_module --with-threads