Closed Juice007 closed 4 days ago
It looks like this has taken UTF-8 encoded text and turned the encoded bytes into individual code points. Instead of decoding to UTF-8 and then encoding those codepoints into \U
codes. When converting this to the individual bytes, the proper Chinese text is produced.
It looks like this has taken UTF-8 encoded text and turned the encoded bytes into individual code points. Instead of decoding to UTF-8 and then encoding those codepoints into
\U
codes. When converting this to the individual bytes, the proper Chinese text is produced.
Sorry. I'm still a little confused about what you mean.
@puellanivis Can you tell me in detail what I should do?
It looks like this has taken UTF-8 encoded text and turned the encoded bytes into individual code points. Instead of decoding to UTF-8 and then encoding those codepoints into
\U
codes. When converting this to the individual bytes, the proper Chinese text is produced.Sorry. I'm still a little confused about what you mean.
Removing all the \U00
, and then hex decoding yields the intended Chinese text: https://go.dev/play/p/KG54AtomS5p
Alternatively to the Go playground instance, thanks to the %-encoding of URIs, this can also be seen with a simple data-URI: data:,%e7%94%a8%e6%88%b7%e4%b8%8d%e5%ad%98%e5%9c%a8 (Chrome shows me the same garbled nonsense on the page, but the URI shows the correct Chinese.
Somehow the text seems to have ended up being converted from UTF-8 bytes directly into Unicode encoding points without proper decoding, à la:
func f(correctString string) string {
buf := new(strings.Builder)
for _, r := range []byte(correctString) {
fmt.Fprintf(buf, "%c", r)
}
return buf.String()
}
https://go.dev/play/p/IPBEQzpuDce
I can’t really help you much further than pointing out that it’s the correct text, just encoded wrong (https://en.wikipedia.org/wiki/Mojibake) without any further code or such. I will note that the Originmsg
appears to also be incorrectly encoded, and is the likely source of the problem with the Returnmsg
. The Returnmsg
is likely just simply repeating whatever it got from the Originmsg
? In which case, we’re not doing anything wrong at all. The client is encoding the Originmsg
wrong.
Thanks @puellanivis !
Our final solution is that the resp returns the urlencoded Chinese string, and the client urldecode the string, which solves the problem
urlencoded string: %E7%94%A8%E6%88%B7%E4%B8%8D%E5%AD%98%E5%9C%A8%0A
urldecoded string:用户不存在
What version of protobuf and what language are you using? Version: main/v3.6.0/v3.5.0 Language:GO、Objective-C
What operating system (Linux, Windows, ...) and version? iOS
What runtime / compiler are you using (e.g., python version or gcc version)
What did you do? Steps to reproduce the behavior: 1、When the request returns Chinese, iOS get a unicode code in the response header:
Decoding into Chinese is a mess
mess code :
expected Chinese:用户不存在