Closed CoelacanthusHex closed 7 months ago
Please elaborate
Release Note and apk are not aligned with git tag. Hope you can release 1.6.2 to include these changes in latest tag. Without "Fix: Failed to parse Telegram message" (build from tag such as installed from f-droid), when copying the invitation link from Telegram, "+" will be replaced by "\n".
I guess this issue is triggered by https://github.com/lz233/Tarnhelm/blob/78a44b1f0c05dbae9e7d1414aa204036f465575f/app/src/main/java/cn/ac/lz233/tarnhelm/util/ktx/String.kt#L98
which input https://t.me/+xxxxx
and output https://t.me/ xxxxx
.
But consider a more complex condition, some Chinese apps insert Chinese characters in URLs, like https://y.music.xxx.com/m/song?name=歌曲名称
and http://xxxlink.com/xxxxxx,复制本条信息,打开【xxx】App查看精彩内容!
. Without .decodeURL()
, may lead to unexpected behaviours.
Recognition of Chinese characters in URLs is also not a bad idea, but it's too targeted and doesn't make me feel very elegant.
I guess this issue is triggered by
which input
https://t.me/+xxxxx
and outputhttps://t.me/ xxxxx
.But consider a more complex condition, some Chinese apps insert Chinese characters in URLs, like
https://y.music.xxx.com/m/song?name=歌曲名称
andhttp://xxxlink.com/xxxxxx,复制本条信息,打开【xxx】App查看精彩内容!
. Without.decodeURL()
, may lead to unexpected behaviours.Recognition of Chinese characters in URLs is also not a bad idea, but it's too targeted and doesn't make me feel very elegant.
I have no idea how to avoid recognition of Chinese characters, but I think using some Unicode script or block match like \p{Han}
instead of a batch of Unicode range may be more elegant.
I guess this issue is triggered by
which input
https://t.me/+xxxxx
and outputhttps://t.me/ xxxxx
.
I wonder why decodeURL
will replace "+" with " ". I read the RFC 3986, which said reserved characters (include "+") won't be touched when encode/decode.
But consider a more complex condition, some Chinese apps insert Chinese characters in URLs, like
https://y.music.xxx.com/m/song?name=歌曲名称
andhttp://xxxlink.com/xxxxxx,复制本条信息,打开【xxx】App查看精彩内容!
. Without.decodeURL()
, may lead to unexpected behaviours.Recognition of Chinese characters in URLs is also not a bad idea, but it's too targeted and doesn't make me feel very elegant.
oh, java's decode use application/x-www-form-urlencoded. It will convert between space and +.
https://stackoverflow.com/a/16453677
As this answer said, since our input isn't application/x-www-form-urlencoded, so we should use java.net.URI replace URLDecoder to get right result.
https://docs.oracle.com/javase/8/docs/api/java/net/URI.html
Document of java.net.URI also said they follow the RFC. I'm not familiar with Java/Kotlin. You can find the method we need from this document.
https://docs.oracle.com/javase/8/docs/api/java/net/URL.html
Oh. java.net.URL document mentioned the difference between it and URLDecoder.
Note, the URI class does perform escaping of its component fields in certain circumstances. The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL(). The URLEncoder and URLDecoder classes can also be used, but only for HTML form encoding, which is not the same as the encoding scheme defined in RFC2396.
Changed to
fun String.encodeURL(): String = Uri.encode(this)
fun String.decodeURL(): String = Uri.decode(this)
For regex, I have tested ((https|http)://)?\p{L}+\.\p{L}+(:\p{Nd})?(\p{Ll}|\p{Lu}|\p{Nd}|/|\?|\+|&|=|\.|-|_|#|%)*
, please consider if there are boundary cases. I'm not sure.
https://github.com/lz233/Tarnhelm/compare/20231120...20240111