Closed dk14 closed 2 years ago
I feel that it's a fine choice of language. It doesn't need to be a popular language just obviously not a Latin script imho.
No bias is intended. I believe it was just because Steve Klabnik knows a bit of Russian. I don't see a compelling reason to change this.
@carols10cents I think you misunderstood, as you addressed the point about the bias rather than the actual problem that is more obvious if you look into the chapter itself (I'm on mobile so a bit lazy to quote it, but the chapter is not that large).
It is NOT my point that the author's preference towards the language (or alternatively the "hello" phrase) is a problem by itself, the particular choice costing an extra explanation (that does not contribute to learning about strings in rust) is the point, while most other non-Latin languages (including some Cyrillic based) wouldn't incur this. At the very least I see the reason to remove explanation about the letter Z as it currently stands. How does it contribute to the explanation about how Rust approaches UTF-8?
To give you more context, Indian graphemes explanation seems relevant for example - as it explains how the whole grapheme could be parsed in Rust. Z-letter explanation just sits there: even if the author wanted to clarify its effect on the length (which only people already familiar with UTF8 would notice) - it should be explicitly clarified then, with a separate example of mixture ASCII and non-ASCII. Otherwise it is like we learning the russian language itself, rather than strings api. I hope contrast with indian version clarifies it - it is explicitly shown there how language api utilizes the nuance of the language.
Otherwise, as I suggested - the language could be changed in order to avoid the explanation altogether. I did not insist on changing the language per se, though.
I understand that the author has a preference because the author is familiar with it - his bias, wether justified or not, is not my main point. My point is that Z-letter explanation that resulted from it stands out as unnecessary, or at least requires improvement.
It is NOT about removing the bias itself unconditionally, it is about fixing its consequence. Am I unjustifiably biased towards Ukrainian version here? Who knows, but following my bias removes the need to explain about Z-letter. It is cheaper and has same effect, when it comes to learning Rust.
It is not the only solution of course, you can also expand on Z-letter explanation (so it would become useful) and keep the language. Actually, having a separate example with 3 instead of 'З', while keeping russian could be even better, although still stretched as it is just easier to explicitly mix a number or Latin with non-Latin string. It is an important nuance about UTF8 encoding that I myself often forget (variable amounts of bytes per letter on a single string). Even though it is kinda explained in the chapter - it is easy to misread it from examples where length of string in bytes is calculated linearly from length of string in symbols.
Reading more into article, it is probably really confusing for people unfamiliar with Cyrillic to see number 3 everywhere (in slices examples for instance), despite the warning. Why does it have to be "hello" anyways.
P.S. Anyways, indexing slices per code unit rather than point is even more confusing (and error-prone), especially without explicitly advertising slower but more reliable ways to take a slice without risk of breaking in the middle of a code point.
I believe that the choice is not biased towards russian language in this section, but still, I think it's better to change Здравствуйте
to another language. First of all, as already mentioned, the first letter З
can (and is https://github.com/rust-lang/book/issues/955) make trouble (or spend more time) understanding that it's not number 3
, which doesn't bring any value and takes unnecessary line explaining. And even given the explanation, it still can be hard to follow the main section logic where 'З' is printed along with other numbers (the first byte of З
is 208). Secondly, the letter З
or the Latin version Z
is considered to be a russian military symbol, and it's better to distantiate from it. So, I would also propose changing the language or at least this Здравствуйте
word. I can help create PR if this change will be accepted.
https://doc.rust-lang.org/book/ch08-02-strings.html#storing-utf-8-encoded-text-with-strings
The documentation uses russian "Здравствуйте" as a main non-ASCII example despite the language being only 8th popular in the world, Cold War nostalgia-style I guess.
The main issue though is actually the explanation about letter "3"(Z) not being a number 3 - besides being pointless it also just wastes time on clarifying irrelevant nuances of a language, which could be completely avoided by switching to a language like Mandarin (more Firefly than "Stranger Things") where these kinds of confusions are likely absent. It really looks like the author put russian for the sake of putting russian.
Alternatively, instead of Mandarin it could also be something cute like Japanese if we want to move even further from geopolitical bias of choosing a "big language". Or, even Ukrainian "Добридень🇺🇦" if Rust is into Cyrillic that much - solves the problem with Z and less aggressive.
P.S. There are lots of other options obviously, I am only reporting the strangeness in the docs.