mdn / yari

The platform code behind MDN Web Docs
Mozilla Public License 2.0
1.17k stars 487 forks source link

Create Traditional and Simplified Chinese conversion tools #4530

Closed kecrily closed 2 years ago

kecrily commented 2 years ago

Summary

There is no big gap between Traditional and Simplified Chinese, we can convert a source content from Traditional and Simplified Chinese to Traditional Chinese or Simplified Chinese by using the Traditional-Simplified tool, just like the Chinese Wikipedia.

Why

There are now two different branches of MDN zh, zh-cn and zh-tw. The translation results for the same document are roughly the same for both.

A Traditional Chinese user can read documents in Simplified Chinese almost without any problems. And vice versa.

The complete segregation of zh-cn and zh-tw translations results in unnecessary duplication of effort. We can avoid the waste of manpower by using a conversion tool. Let a translated content composed of both traditional and simplified Chinese be converted into an entirely zh-cn or zh-tw content by the tool.

What needs to be done?

Some project that might help

OpenCC

peterbe commented 2 years ago

Would it be "offensive" or upsetting if we remove one and focus only on the other? There are 5,213 files in zh-CN and only 1,246 in zh-TW.

If most "Chinese reading people" can read either, wouldn't they benefit from consuming the most actively maintained one?

kecrily commented 2 years ago

Would it be "offensive" or upsetting if we remove one and focus only on the other? There are 5,213 files in zh-CN and only 1,246 in zh-TW.

@peterbe What I mean is to merge them into a zh where the source content is a mix of Simplified Chinese and Traditional Chinese, and then distribute them as Simplified Chinese and Traditional Chinese by using a conversion tool.

Not to remove one of them, but to make them co-exist in a project.

If most "Chinese reading people" can read either, wouldn't they benefit from consuming the most actively maintained one?

Of course it is. For example, if a document does not have a Simplified Chinese version, then a Simplified Chinese user might check to see if it has Traditional Chinese.

But if a document is translated into Simplified Chinese version and Traditional Chinese version respectively, it is a waste of the maintainer's effort.


This model has lasted for many years in the Chinese Wikipedia, and it has stood the test of time.

kecrily commented 2 years ago

I tried to express it using a model.

We use uppercase for Simplified Chinese and lowercase for Traditional Chinese.

Translated content[^1]

A b C C

For Simplified Chinese User, they will see

A B C C

For Traditional Chinese User, they will see

a b c c

[^1]: Different paragraphs were translated by Simplified Chinese users and Traditional Chinese users at different times. The final translation came out with a mix of traditional and simplified content, but the translators were able to understand it.

peterbe commented 2 years ago

Not to remove one of them, but to make them co-exist in a project.

I don't think we understand this. We have files written in zh-tw and files written in zh-cn. Each file becomes its own URL. Are you proposing that we "transpose" each missing-in-the-other-Chinese file so that you end up with a superset of equal number of translations in both zh-tw and zh-cn?

E.g.

for (file of getFiles('zh-cn')) {
  if (!hasFile(file, 'zh-tw')) {
    copyFile(file, 'zh-tw', OpenCC(file))
  }
}
// and
for (file of getFiles('zh-tw')) {
  if (!hasFile(file, 'zh-cn')) {
    copyFile(file, 'zh-cn, OpenCC(file))
  }
}

Or perhaps to not do this with the source files but to do it with at build time?

peterbe commented 2 years ago

One challange with automation is that even with powerful tools like OpenCC, it's worth pointing out that our content is not structured. It's messy HTML files.

kecrily commented 2 years ago

Are you proposing that we "transpose" each missing-in-the-other-Chinese file so that you end up with a superset of equal number of translations in both zh-tw and zh-cn?

Simplified and Traditional Chinese are like the British and American expressions in English. They may or may not appear in the same document.

One-to-one conversion between traditional and simplified Chinese is possible. Regional words would require a corresponding database to be created for processing, but OpenCC already does this for us.

Traditional and simplified users work on the same translated file, and the conversion tool distributes them into two versions that are entirely zh-hans and entirely zh-hant.


It's messy HTML files.

I see that there is html->md work in progress and markdown should work well for the conversion.

KevinZonda commented 2 years ago

@peterbe Hi there. I think the relationship between Simplified Chinese and Traditional Chinese is similar to that between British English and American English.

The biggest difference between them is that Simplified Chinese uses simplified characters and Traditional Chinese uses traditional characters. Despite using different characters, so far we can easily find solutions for converting Simplified Chinese to Traditional Chinese and vice versa.

Another difference between them is the use of words. We can refer to the corresponding Wikipedia thesaurus: regional vocabulary will not be the same, but there is a mapping table. So it is easy for us to localise.

Based on the relationship between Traditional Chinese and Simplified Chinese, I agree with @kecrily 's proposal for MDN. Merging Simplified Chinese and Traditional Chinese can significantly reduce the amount of work we need to do and can facilitate cooperation within each region.

LLLgoyour commented 2 years ago

@peterbe Basically it's not a big problem to read "traditional-simplified-mixed Chinese" for "Chinese reading people", but it'll be better if there is conversion for different "Chinese reading people". Like British English and American English, different forms of words and phrases exist in seperated regions. What we need to do is to replace words and phrases according to local vocabulary while we try to transform diffrent forms of letters.

KevinZonda commented 2 years ago

@t7yang @irvin hi, what do u think about this?

irvin commented 2 years ago

I disagree with that, Traditional Chinese (primarily used in Taiwan and Hong Kong) and Simplified Chinese (for the readers from other places) had different standard practices on technology translation.

1) On Traditional Chinese, it’s very common to use English phrases directly for brands and programming terms, but in Simplified Chinese, most of the names will be translated. The best example is that we use “Firefox” (without translating into Chinese) in Taiwan and “火狐” in China for most of the cases.

2) Big phrases difference in technology domain. For example, we had the different names of the RAM (内存/記憶體), Mouse (鼠标/滑鼠), HD (硬盘/硬碟), Thread (線程/執行緒), tab (页签/分頁)… this list can go to tens of thousands, and if we want to maintain an automatic translation system, we will also need to maintain a conversion table, in some cases, on per article basis (Took Wikipedia as an example). This is what Wikipedia did, I and many local Wikipedians feel it eventually became a very very big entry barrier and a big hassle for translators that want to ensure reading experience for both party.

3) Most of the programmers in Taiwan are used to read English documents directly, and for those who would like to have Chinese doc as a reference but find no Traditional translation, they can easily convert Simplified MDN articles into Traditional Chinese on the fly with the help from browser add-on such as famous tongwentang (70000 users on Chrome and 12000 users on Firefox), or direct read Simplified docs. I don’t believe the auto-translate system can help them much.

4) MDN had a much much larger amount of Simplified docs than Traditional docs (almost 5x) and less less translator base from Taiwan than China (definitely less than 1/5). I believe most of the docs with Trad. version should already have a Simplified translation. Combining both can’t really grows the amount of translated docs of Simplified Chinese locale.

5) Due to the above reasons, non of the MDN reviewers from Taiwan (on Yari or on the previous platform either) accept any docs that are a direct convert from Simplified to Traditional without a manual rewrite. The experience is worse for a Traditional Chinese reader to arrive on machine conversion doc than to find no translation at all (they can always go to Simplified version or English on their preference instantly).

irvin commented 2 years ago

In summary, due to the amount differences that Simplified MDN docs are 5x larger than Traditional docs, for Simplified Chinese readers, the benefit of an automatically conversion system is very limited, the number of translated docs will not gain much.

For Traditional Chinese readers and translation contributors, the experience of reaching to an obviously machine conversion docs, or the effort of tried to maintain a huge conversion table (and eventually got a non-ideal result) is not something that local contributors would want to.

In fact, I don't think MDN contributors on Trad. Chinese locale considers the fewer docs than Simplified Chinese is a “problem” at all. We emphasize the quality of translation (fluence and easy reading) for local readers over the percentage of translation.

So I'm not sure it is worth it for us to invest resources in developing such conversion system on MDN.

irvin commented 2 years ago

Here I had another example from my experience with the dev-rel team: we ran an experiment to translated and published "Firefox for Developers" from Firefox 74 to 77 into both Simplified Chinese and Traditional Chinese.

The original English doc first goes to a translation company to be translated into Simplified Chinese, then machine converted into Traditional Chinese and sent the converted result to me. It took me more than 2 hours per-version to rewrite the docs to ensure the reader can have a good and fluency reading experience.

I need to work on switching phrases, adjust the grammar of paragraphs, and in most cases, I will need to double-check the English version to ensure I (and Simplified translator) is misunderstanding anything, and actually, I'm still not satisfied with the published article. In fact, it would be even faster if I just re-translate it all again from English, than did such a massive review and rewrite on the conversion result.

I don't think it's like the case of American English and British English if we want to ensure the perceived quality and reading experiences for readers from both locales.

https://hacks.mozilla.org/zh-hant/2020/03/security-firefox-74/ (Trad. Chinese) https://hacks.mozilla.org/zh-hans/2020/03/security-firefox-74/ (Simplified Chinese) https://hacks.mozilla.org/zh-hant/2020/04/firefox-75/ https://hacks.mozilla.org/zh-hans/2020/04/firefox-75/ https://hacks.mozilla.org/zh-hant/2020/05/firefox-76-audio-worklets/ https://hacks.mozilla.org/zh-hans/2020/05/firefox-76-audio-worklets/ https://hacks.mozilla.org/zh-hant/2020/06/firefox-77/ https://hacks.mozilla.org/zh-hans/2020/06/firefox-77/

toto6038 commented 2 years ago

Actually the difference of these two language variants is much larger than you thought. Just take a look at some computer science entries on Wikipedia, it has showed that an auto conversion tool can result in more problems. To keep contents looked native, many specific conversion rules are required even if having help from auto conversion tool and conversion groups.

kecrily commented 2 years ago

peterbe: There are 5,213 files in zh-CN and only 1,246 in zh-TW.

We assume that seven hundred articles have been translated repeatedly, and assume that it takes an average of twenty minutes to translate one article. Then there are fourteen thousand minutes that do not necessarily need to be spent.


irvin: this list can go to tens of thousands, and if we want to maintain an automatic translation system, we will also need to maintain a conversion table, in some cases, on per article basis (Took Wikipedia as an example).

First of all this mapping table we don't need to start from scratch, there are many ready-made items already. Secondly MDN is mainly Web related content, so the mapping in terms of terminology is also limited.

Does it take tens of seconds to add a mapping item, or does it take twenty minutes to start from scratch.

Since we mentioned the Wikipedia model, we have to mention that since the Chinese Wikipedia was hatched, there have been many discussions on the separation of traditional and simplified languages. But in the end, it was the mixed use of traditional and simplified that won out, and this situation may not necessarily extend to MDN, but I think it is a good reference.

The Chinese Wikipedia has gone from having multiple versions of a single content to having a unified entry with a mix of traditional and simplified versions, and has remained so for many years. The Chinese Wikipedia has 1.22 million entries, and the model of mixing and converting works well. Why do you think this is a problem on MDN, which has only about 10,000 documents?

huanyichuang commented 2 years ago

I would say this issue either ignorant or offensive. You don’t mix Portugal Portuguese with Brazilian Portuguese. zh-hant has quite different phrases and terms from zh-hans.

eric7578 commented 2 years ago

There're so many terms are TOTALLY NOT the same, there's no reason to doing so.

JasonHK commented 2 years ago

Please don’t mess with our language (i.e. Traditional Chinese), thank you.

KevinZonda commented 2 years ago

So why Wikipedia can do, Mozilla can't?

huanyichuang commented 2 years ago

peterbe: There are 5,213 files in zh-CN and only 1,246 in zh-TW.

We assume that seven hundred articles have been translated repeatedly, and assume that it takes an average of twenty minutes to translate one article. Then there are fourteen thousand minutes that do not necessarily need to be spent.

irvin: this list can go to tens of thousands, and if we want to maintain an automatic translation system, we will also need to maintain a conversion table, in some cases, on per article basis (Took Wikipedia as an example).

First of all this mapping table we don't need to start from scratch, there are many ready-made items already. Secondly MDN is mainly Web related content, so the mapping in terms of terminology is also limited.

Does it take tens of seconds to add a mapping item, or does it take twenty minutes to start from scratch.

Since we mentioned the Wikipedia model, we have to mention that since the Chinese Wikipedia was hatched, there have been many discussions on the separation of traditional and simplified languages. But in the end, it was the mixed use of traditional and simplified that won out, and this situation may not necessarily extend to MDN, but I think it is a good reference.

The Chinese Wikipedia has gone from having multiple versions of a single content to having a unified entry with a mix of traditional and simplified versions, and has remained so for many years. The Chinese Wikipedia has 1.22 million entries, and the model of mixing and converting works well. Why do you think this is a problem on MDN, which has only about 10,000 documents?

Just because it exists, it doesn’t mean it’s ideal or applaudable. Since you mentioned so, I’ll share my personal experience as a zh-hant user in Taiwan. In Taiwan, Wikipedia is notorious for its quality due to your so-called “good practice” to mix phrases and terms, even some grammatical incompatibilities.

Let’s think what it’ll happen if Portuguese ask Brazilian Portuguese to “be merged.” Sounds absurd, doesn’t it?

komali2 commented 2 years ago

Efforts to erase Traditional characters are akin to efforts in general to minimize the valid existence of the many countries and people that use them. Just because "many more people" use simplified doesn't justify erasing the traditional articles (which is what the proposition calls for).

Mixing on the Mandarin Wikipedia is a problem. There should be zh-cn and zh-tw articles. MDN shouldn't follow the anti-pattern of assuming they're the same.

I also challenge the notion that traditional character readers can "easily" read simplified. This is not the case. And the expectation that they "just quickly translate it" is impolite. Plopping text into a conversion table causes loss of styling. Plugins that do so automatically can be unreliable regarding styling as well. It's an expectation held for no other user.

In any case, this request is presumptive and should be closed as won't-do. While I want to give good faith, this request is within a pattern of efforts by PRC backers to force the culture of the PRC on nations they're undergoing an effort to imperialize (Hong Kong) or have plans to do so (Taiwan). The political angle should be considered - if MDN wipes out Traditional Character articles in favor of Simplified ones, it's effectively aiding the PRC in its campaigns of wiping out other cultures in favor of CCP approved homogeny. Call me overdramatic if you wish but you can see this happening all over the internet: from ostensibly language learning apps like Duolingo and hello talk, to social media applications like Tik Tok, to gaming (blizzard) and even airlines (forcing them to use "Chinese Taipei," a nonsensical name). At minimum, the issue should be closed simply for being politically motivated. The best thing MDN can do is remain uninvolved - namely, by taking no action, and leaving zh-cn and zh-tw articles as separate.

komali2 commented 2 years ago

So why Wikipedia can do, Mozilla can't?

Wikipedia can. It shouldn't.

ymcheung commented 2 years ago

Let's take this article translated by opencc for example: https://github.com/chromaui/learnstorybook.com/blob/191b88830fdc4a519cf9180e9d891ce6989c6084/content/intro-to-storybook/react/zh-TW/get-started.md

Do you think it's readable?

komali2 commented 2 years ago

Let's take this article translated by opencc for example: https://github.com/chromaui/learnstorybook.com/blob/191b88830fdc4a519cf9180e9d891ce6989c6084/content/intro-to-storybook/react/zh-TW/get-started.md

Do you think it's readable?

One article against converting tens of thousands is not a strong argument in favor.

The presumption is that articles will be written "normally" (in simplified, with Beijing-esque dialectical choices), and traditional character readers will be fed the scraps of whatever comes out of the conversion table, with no consideration for localization or entirely different vocabulary. Tell me, what do you call a hotel? As one example out of thousands, this should give you an idea of what's at stake.

Traditional character users write and talk differently. It's a different culture, arguably approaching a different dialect as the language groups (zh-tw and zh-cn) split.

If everyone throws their hands up and just "lets simplified win," we stop that organic process and force tens of millions of people to read in a dialect that they don't use day to day.

kpkonghk01 commented 2 years ago

Words and phrases of zh-cn and zh-tw are not 1 to 1 e.g. cn 干 can be 乾/幹/榦/干 in tw Not to mention the grammar differences, difficulties in words tokenization of chinese

Many game/media companies who use OpenCC or other translation tools translate chinese contents only for draft versions and keep both cn and tw versions editable separately for practical reasons.

There is indeed a big gap between zh-cn and zh-tw.

yichung279 commented 2 years ago

If documents show Simplified Chinese in traditional characters, beginners may learn Simplified Chinese terms and think those are Traditional Chinese.

If more and more people use Simplified Chinese in traditional characters, Traditional Chinese will disappear eventually. This is my true concern.

hubertwang commented 2 years ago

Disagree, zh-TW and zh-CN are not simply different on how to write, but also on the culture. People live in different region has different history, it turns into different ways to express themselves. Besides, traditional Chinese characters preserve the beauty of how they have been created, it definitely worth to keep them.

Hsins commented 2 years ago

So why Wikipedia can do, Mozilla can't?

What you say here is ignorant and offensive. Why Bill Gates is so rich but you couldn't be? Please browse Zhihu and read answers about the question 如何合理地批驳「存在即合理」? in Simplified Chinese that you could understand.

WikiPedia took that solution but Mozilla doesn't. That's why we should discuss before any pull request be accepted.

AirNoir0605 commented 2 years ago

Some words, for example, 'Winnie the pooh' existing in an article, would become a space/null in Simplified Chinese. It's a big difference.

Jamesits commented 2 years ago

Even within one language, people in different areas develop different dialects. Even if the characters used are the same or alike, a language is a lot more than its character set, and differences exist. Technology could allow more people to access content, but should not in a way that erases the differences between groups or individuals.

A lot of software claims they can do conversions between zh-CN, zh-HK, zh-TW, etc. There are mainly two types. Ones based on traditional methods (e.g. OpenCC) only does a good job on character mappings and term replacements based on a dictionary, they can't modify the sequence to make it more natural. Ones that use machine learning simply treat these languages as separate languages so the result is no better than machine learning based translation between any two other languages. These software are not sufficient to deliver an authentic native reading experience in the aspect of idioms and tones.

Personally I can only accept redirecting pages not existent in zh- to another zh- variant with a tooltip telling the user what has happened, as there are a significant percentage of people who can read both traditional and simplified Chinese characters, and the other Chinese is definitely more accessible than a completely different language. But using automated translation to create a fake page? It always make me feel the cold side of human-agnostic technology.

PeterWolf-tw commented 2 years ago

You would't make a proposal to merge French, Spanish, Português and Italian into one language encoding with the reason that there are some relatively (comparing to OpenCC) good conversion tools among them. Why would you pick Traditional Chinese (mainly used by the people living in Taiwan, the little island Chinese government wants to destroy) and merge it into Simplified Chinese (mainly used by people living China, the big strong mighty Asian superpower)?

Software technologies are made by the people, serving for the people and reflecting the cultures and more of the people. That's why we started ISO encoding system and UNICODE project to accept and tolerance the differences of different ideas of characters and languages across different systems.

If we choose to merge them for the reason that we have a good tool (which actually we don't) to translate them from one to another, the original intentions of these projects will be lost.

KevinZonda commented 2 years ago

I think there may have been a misunderstanding in most of the discussions. We do not want to merge the two Chinese languages, we want to expand the document of both variants by an acceptable and reasonable conversions.

KevinZonda commented 2 years ago

What I'm trying to say is that we may just need to convert a part of the translated content that doesn't exist and then calibrate it. There is absolutely none sense to translate two copies of Chinese, just as we would not maintain American English and British English separately.

komali2 commented 2 years ago

What I'm trying to say is that we may just need to convert a part of the translated content that doesn't exist and then calibrate it. There is absolutely none sense to translate two copies of Chinese, just as we would not maintain American English and British English separately.

The difference between American and British English is far less than traditional and simplified mandarin.

You're doing it right now. You're in your comment erasing the legitimacy of zh-tw. You said "no reason to translate two copies of Chinese." What "two copies" of Chinese? I see mandarin, PRC dialect written with simplified characters, and mandarin, Taiwan and HK dialects, written in traditional characters. Ironically you erased other "Chinese" languages based in the PRC itself: Shanghainese, ningbonese, etc.

kecrily commented 2 years ago

Thank you to those who do not participate in the discussion, but will vote. They let us know that there is such a large group of people in the world who are concerned about us. Thank you to the two hundred users who suddenly appeared this afternoon who did good deeds (maybe?) without leaving their names.

kpkonghk01 commented 2 years ago

@KevinZonda

  1. He really wants to merge 2 Chinese versions as he said:

    The final translation came out with a mix of traditional and simplified content, but the translators were able to understand it.

  2. The differences between zh-cn and zh-tw are not comparable to American English and British English as mentioned by many other peoples above.

  3. Redirection is enough for missing cn or tw version of translation

pan93412 commented 2 years ago

Let's take this article translated by opencc for example: https://github.com/chromaui/learnstorybook.com/blob/191b88830fdc4a519cf9180e9d891ce6989c6084/content/intro-to-storybook/react/zh-TW/get-started.md

Do you think it's readable?

Just asked the Simplified Chinese users - most of them can't also read the Simplified Chinese version well. I think it was translated with Google Translate, which is not comparable for this case.

@kecrily said that he want to implement something that Chinese Wikipedia does. However, we can see that some words can't convert well there. @irvin has explained it a lot. Including OpenCC, Wikipedia, and zhconvert.org, they can't still handle much of such case.

I think it is better to translate these Traditional Chinese articles by our Taiwanese and other countries who uses Traditional Chinese. For some untranslated articles, we can choose to read them in Simplified Chinese, English or even translate by ourselves. There is no needs to integrate both variants of Chinese.

minipai commented 2 years ago

Let's take this article translated by opencc for example: https://github.com/chromaui/learnstorybook.com/blob/191b88830fdc4a519cf9180e9d891ce6989c6084/content/intro-to-storybook/react/zh-TW/get-started.md

Do you think it's readable?

Unreadable.

Maybe start by getting the punctuation marks right.

amliu commented 2 years ago

There is no big gap between Traditional and Simplified Chinese. If most "Chinese reading people" can read either, wouldn't they benefit from consuming the most actively maintained one?

These 2 statements totally entertained me. I'm not sure if every SC users really can seamlessly change your wordings when talking to / texting SC / TC users and make both sides consider you're native, but for me as a Traditional Chinese user, I couldn't and most of my friends don't either.

To all non-Mandarin native speakers, please don't be deceived into that zh-cn and zh-tw have no big differences, they are not just different in character appearances but also wording, phases and grammars. And many Taiwanese developers prefer English version over SC because it's more comprehensive to us.

Back to the subject, I'd suggest that just let contributors decide which language they want to contribute to like now and with those contents not translated into TC, just let TC users choose an alternative version by their free will, doesn't it how the open source world goes ?

alivedise commented 2 years ago

So why Wikipedia can do, Mozilla can't?

So why we can read Traditional Chinese but you can't? This is the WORST reason and WORST attitude to promote something you really want to others.

Would it be "offensive" or upsetting if we remove one and focus only on the other? There are 5,213 files in zh-CN and only 1,246 in zh-TW.

If most "Chinese reading people" can read either, wouldn't they benefit from consuming the most actively maintained one?

No. I read both, but this does not mean everyone who is eager to learn web technologies should learn both in order to utilize MDN. You are a MDN dev/contributor? How could you say that? I feel shameful on ever being working on this company if this(merge) really happens.

Hsins commented 2 years ago

Thank you to those who do not participate in the discussion, but will vote. They let us know that there is such a large group of people in the world who are concerned about us. Thank you to the two hundred users who suddenly appeared this afternoon who did good deeds (maybe?) without leaving their names.

If you are really eager to know who disagree with you for leaving such reactions, check more information here.

They actually leave their names on the Internet.

komali2 commented 2 years ago

Thank you to those who do not participate in the discussion, but will vote. They let us know that there is such a large group of people in the world who are concerned about us. Thank you to the two hundred users who suddenly appeared this afternoon who did good deeds (maybe?) without leaving their names.

1000>100>10>1 rule. For every poster, there are 10 commenters, 100 voters, and 1000 lurkers. This is normal internet behavior. Are you implying you want those extra 1000 people to simply reply "yes I agree" lol?

sherryliao21 commented 2 years ago

While everyone commenting here are mostly experienced developers, I’d like to share my thoughts as a very junior one who self-taught programming online referring to a huge amount of traditional Chinese resources.

As a beginner, I wouldn’t be able to imagine how troublesome it would be to read a total different term in simplified Chinese on MDN comparing to the tutorials I watch explaining in traditional Chinese terms. That would confuse me a lot and will cause more work for me to search for the definition, ending up knowing “oh wow, they’re the same thing but the translations between the two makes it so confusing”

I refer to MDN a lot, and sometimes my English is just so bad that I had to read tw version in order to understand some concepts. It would definitely make it harder for me to understand if the only resource in Chinese uses a total different term than what I use daily. Plus, the OpenCC translation isn’t perfect. I guess you can use it for a rough draft and edit it to a more accurate version later, but why wasting the time of these editors? To me, checking line by line to swap terms between two different Chinese users require more time than just simply translate it from English, the context sometimes varies a lot more than you think.

I’m just sharing my opinion on how MDN should be also beginner-friendly, because more and more people are learning web dev and they deserve to have an accessible resource written in their most familiar language/terms. So there’s definitely a need to preserve both versions.

I’d like to contribute to MDN traditional Chinese translations once I learn more to be able to help, if the amount of documents of both language is your concern here.

MnHung commented 2 years ago

簡體字簡陋過頭了

t7yang commented 2 years ago

There are now two different branches of MDN zh, zh-cn and zh-tw. The translation results for the same document are roughly the same for both.

A Traditional Chinese user can read documents in Simplified Chinese almost without any problems. And vice versa.

Only for casual conversation. Instead, in technical content like MDN, zh-TW and zh-CN have a huge gap. You can find many comments than support for this view.

The complete segregation of zh-cn and zh-tw translations results in unnecessary duplication of effort. We can avoid the waste of manpower by using a conversion tool. Let a translated content composed of both traditional and simplified Chinese be converted into an entirely zh-cn or zh-tw content by the tool.

@irvin show some great reasons here for why this is not a good idea and almost no disadvantage for rely on conversion tool like opencc.

The conversion tool can not handle all situation, the difference between zh-TW and zh-CN including character, phase, slang, even we use difference punctuation.

Conversion tool can provide less weird outcome for zh-TW to zh-CN, because zh-TW to zh-CN have many "many to one" cases, this mean zh-CN to zh-TW has many "one to many" cases which almost impossible handle by conversion tool (at least OpenCC can not handle it well). So, this is unfair for zh-TW reader.

Overall, based on cultural and technical perspective, I disagree with this proposal.

Hsins commented 2 years ago

簡體字簡陋過頭了

You can disagree the idea of this issue but please respect the culture and languages of others.

你可以不同意這則 issue 作者的言論,但沒必要針對文字或文化進行攻擊。

CindyLinz commented 2 years ago

There is no big gap between Traditional and Simplified Chinese

This sentence is like "all Asians look the same" for me.

shanehsu commented 2 years ago

There are many point of views here, I’d like to address a few in a TL;DR format. (unquoted texts are my opinions)

  1. If most "Chinese reading people" can read either, wouldn't they benefit from consuming the most actively maintained one?

    I would argue most developer can read English, by that logic we should just maintain the English version and force everyone to read that.

  2. (rephrased) Differences between Traditional and Simplified Chinese is subtle enough, e.g. can be mapped with a table or set of rules.

    However, as dialects of languages would often have drastically different choice of words. While I wouldn’t go as far as to say that grammatical structure differs, but Traditional and Simplified Chinese would definitely convey an idea with different sentence/structure. For concrete examples, refer to comments by @irvin

  3. (rephrases) The purpose is not merging or killing the other, but simply bridging missing articles.

    Not to be too political, but blurring the difference sounds like a precursor to cultural genocide, this also discouraged people from making proper translation for the missing dialect altogether.

  4. (rephrased) A user of MDN have the freedom of choice to use either or switch language whenever.

I believe this is a critical juncture for MDN, and would set precedence to other projects involving this or any other pair of languages. Localization is a hassle, it always will be. But translations are here to help people understand the idea in their own language, lower the bar of information/knowledge sharing.

If MDN contributors of either Chinese dialect has spend their time maintaining their own version/vision, we should respect them by having those separated as per the status quo. This is, I’m afraid, not a technical issue where we focus on efficiency, but a cultural one where we should be inclusive and respectful.

I’d also like to remind everyone here that debate and criticism to the issue is welcome, as that’s how to have a meaningful, progressive, and productive discussion. However, please refrain from discrimination and name-calling.

kuanyui commented 2 years ago
  1. From the view point of a Taiwanese, this proposal sounds like (yet another) kind of cultural genocide, which is what the government of China Communist Party is doing to the minor races in their country (though they never admit it), and attempting to do to Taiwan now (they won't admit this, of course). They never miss any chance to do anything like this.

  2. To those who don't understand Chinese languages: I've ever heard that Spanish and Italian are so similar that even one speaks in Spanish and another in Italian, they can roughly understand each others. So "why not "unify" the two language, and convert terms automatically?". This sounds dumb, YES, BECAUSE I don't understand Spanish nor Italian, I would NEVER say like that either. So I don't think non-Traditional Chinese users have right to decide this for us. You may think: "why Simplified Chinese users think it's nothing to merge?"... OF COURSE, because they are the bigger ones, and Taiwanese are the side who ARE conquered and ARE merged, at least considering the users amounts of each sides.

  3. There is even no existent auto converter can convert Simplified Chinese to Traditional Chinese, even merely characters regardless of terms, 100% correctly. It just like ask one convert JPEG back to PNG losslessly. (e.g. SC:= -> TC:= 后|後). Terms are worse, for example注销/註銷 (SC: logout, TC:discard / declare something void).

  4. Yes, you can avoid the waste of manpower by using a conversion tool: just ignore the zh-TW pages if they are not translated, instead of discarding all contributions from Taiwanese contributors in the past. This is (ahh, yet another misfortune that Taiwanese have to face nearly everyday) typical China-style arrogance.

  5. I know what I've said may sound harsh and even emotional, but live in Taiwan, again and again face to such thing everyday, read lots of odd-translated IT-related articles on the Internet, information-war and fake news and arrogant speaks from China, and military harassment from China, I have enough of it. ...Yes, yes, yes, they would never admit these, as if they even doesn't admit the Coronavirus is from Wuhan, China.

ccshan commented 2 years ago

I'm one of many people whose language preference order is zh-tw > en-us > en > zh-cn. If some content is only available in (say) zh-cn and sw, then I'll pick zh-cn, but I'm much slower at zh-cn than at en, due to reasons stated already: https://github.com/mdn/yari/issues/4530#issuecomment-903174239 , https://github.com/mdn/yari/issues/4530#issuecomment-903290702 .

Although the proposal has a section labeled "Why", it did not answer a fundamental question: Who would this help? (And how do we know they would be helped?)

tomchentw commented 2 years ago

In any circumstances that this idea is acceptable, we should keep the Traditional Chinese (the complex version) variant and treat it as a single source of truth. Then, we translate it into the Simplified Chinese variant.

The reason is pretty straightforward because only the complex things could be simplified but not vice versa.