mozilla / pontoon

Mozilla's Localization Platform
https://pontoon.mozilla.org
BSD 3-Clause "New" or "Revised" License
1.44k stars 521 forks source link

Punctuation lacking in pretranslations for zh-CN and zh-TW #3238

Open Olvcpr423 opened 1 month ago

Olvcpr423 commented 1 month ago

Pretranslation for zh-CN and zh-TW often lack necessary punctuation, making the sentences overlong and hard to understand. Here are some typical examples:

  1. String 303974 Source text: The entire world deserves free and open source software, and Thunderbird is currently translated in more than 50 languages! Use your multilingual talents to help make Thunderbird more widely available than ever. Pretranslation: 人人都值得拥有自由与开源的软件而 Thunderbird 目前已被翻译成超过 50 种语言发挥您的多语言才能使 Thunderbird 更加普及。 Correct punctuation: 人人都值得拥有自由与开源的软件 [,] 而 Thunderbird 目前已被翻译成超过 50 种语言 [!] 发挥您的多语言才能 [,] 使 Thunderbird 更加普及。

  2. String 303994 Source text: Are you an experienced Thunderbird user who loves lending a helping hand? Put your knowledge to great use by joining our Support Crew and helping users around the world with their Thunderbird questions. Pretranslation: 您是 Thunderbird 的老用户并且乐于助人吗学以致用加入我们的技术支持团队帮助世界各地的用户解决 Thunderbird 问题。 Correct punctuation: 您是 Thunderbird 的老用户并且乐于助人吗 [?] 学以致用 [,] 加入我们的技术支持团队 [,] 帮助世界各地的用户解决 Thunderbird 问题。

  3. String 301733 Source text: { -brand-name-firefox } is powered by the world-class { -brand-name-gecko } engine, with shockingly fast styling and page layout, modern JavaScript features and a never ending drumbeat of new performance improvements to keep our users happy and push the entire web platform forward. Pretranslation: { -brand-name-firefox } 使用世界一流的 { -brand-name-gecko } 引擎提供超快的樣式與頁面排版、現代的 JavaScript 功能以及持續不斷的新效能改善功能讓使用者滿意並推動整個網頁平台向前發展。 Correct punctuation: { -brand-name-firefox } 使用世界一流的 { -brand-name-gecko } 引擎提供超快的樣式與頁面排版、現代化的 JavaScript 功能 [,] 並持續不斷改善效能 [,] 以讓使用者滿意 [,] 並推動整個網路平台向前發展。

Pretranslations from translation memory are unaffected.

mathjazz commented 1 month ago

@Olvcpr423 Thanks for reporting!

We're aware that the quality of zh-CN and zh-TW Google machine translations is sadly among the worst: https://pontoon.mozilla.org/insights/

Do you perhaps use any other machine translation service internally, with better results?

LaoshuBaby commented 1 month ago

Do you perhaps use any other machine translation service internally, with better results?

Thank you for your response about this issue, can you tell us which translation service is available as alternative? For example Bing or DeepL?

Another question is that I've try directly paste the source string on Google Translation's webpage, and it give a acceptable answer with correct punctuation used.

correct-punctuation-on-google-translation-website

Olvcpr423 commented 1 month ago

We're aware that the quality of zh-CN and zh-TW Google machine translations is sadly among the worst

@mathjazz Thank you for your response. As @LaoshuBaby mentioned, translations directly from Google Translate don't have this problem.

Putting punctuation aside, pretranslate performs better in terms of terminology and style compared to pure Google Translate, thanks to its machine learning from existing translations in the project. That's really cool!

Another major problem is that if new strings are not reviewed in time, suggestions from pretranslate will be automatically applied to the production environment, which might confuse users. So I really hope this issue can be fixed.

Do you perhaps use any other machine translation service internally, with better results?

Yes, we will use pretranslations if they are appropriate, and use other machine translation service if necessary.

petercpg commented 1 month ago

I think I raised some quality issue in the closed-beta test sheets, and taking pretranslated (but not proofread) strings into the project make it worse, such as Mozilla.org website contents.

As to alternative MT services, I personally just use Google Translate, as DeepL does not translate contents to zh-TW. If we may use LLMs, I think we can use ChatGPT (GPT-3.5 should be good enough) with some prompt engineering to make it translate more locally.

mathjazz commented 1 month ago

Thanks for the feedback!

I filed https://github.com/mozilla/pontoon/issues/3254, which will allows us to test performance with the generic Google Translate engine.