translate-tools / linguist

Translate web pages, highlighted text, Netflix subtitles, private messages, speak the translated text, and save important translations to your personal dictionary to learn words even offline
https://linguister.io
BSD 3-Clause "New" or "Revised" License
694 stars 23 forks source link

Enhancement Request: Block-Level HTML Translation #375

Open mosugi opened 1 year ago

mosugi commented 1 year ago

Hello esteemed developers,

First of all, I'd like to express my sincere gratitude for your hard work in maintaining and improving this great project. Your dedication is invaluable to the entire community.

Recently, I've encountered an issue regarding the translation of Japanese language within HTML content. As you might know, the syntax and structure of Japanese significantly differ from English. This leads to inaccurate and often nonsensical translations when translation is done at the inline HTML element level.

To illustrate, let's consider the following example:

<p>She is <em>looking forward</em> to your visit.</p>

Translated results are as follows:

<p>彼女は<em>楽しみ</em>ご来場ありがとうございました。</p>

Retranslate back to English and it will look like this:

<p>She <em>enjoyed</em> the show, thank you for coming.</p>

Translating this on an element-by-element basis would not yield the intended meaning in Japanese.

Therefore, I propose a feature enhancement where translation happens at the block HTML element level, instead of at the inline level, especially when dealing with languages like Japanese.

For example, the translation feature could be enhanced to behave as follows:

<div>彼女はあなたの訪問を<em>心待ちに</em>している。</div>

Retranslate back to English and it will look like this:

<div>She is <em>looking forward</em> to your visit.</div>

The entire div block is translated as a single unit, which will lead to a more accurate translation.

Google Translate and DeepL translations support the translation of strings containing HTML.

It would be very nice to see features such as inline elements translating at the same time as block elements implemented in the options settings.

I hope this suggestion is taken into consideration and am looking forward to seeing how this project continues to evolve.

Thank you again for your efforts and dedication.

Best regards.

vitonsky commented 1 year ago

Hi, thank you for feedback. Languages diversity in feedback a much important for language tools!

The problem is clear for me, thanks for detailed explaining with examples.

Important approach in Linguist architecture is that Linguist not depends of translators implementations. It make able to use any translator implementation and use all features of Linguist. If we will bind to features of google translator or other service, we can't to use some features of Linguist with other translators who does not support HTML tags translation or implement this translation other way.

Thus, Linguist it is a platform that implement all features itself and allow to use features with any trivial translator implementation. All things translator must do - translate one string and translate array of strings. It is easy to implement.

To solve problem your mention above, we have to improve Linguist behavior, not just to use google translator features to translate HTML.

Let's think and converse how to implement behavior to translate Japanese texts better.

It is good idea, to implement optional feature to translate texts on block level, and enable this feature automatically for Japanese language. Can you please send me some links about this approach? Maybe it is popular idea and we have guides in internet "how to translate HTML with Japanese text". Your opinion are most important, because i can't speak Japanese and i can't measure quality of results.

For now i have few questions that we must answer to implement this feature:

About last point

Translators have 2 method translate to translate single string and translateBatch to translate few texts.

If we will detect text block with 3 segments (彼女はあなたの訪問を, 心待ちに, している。), how to translate this segments?

We can join this segments to one string, but then we will got one string as response and it is not clear how to handle this case and insert proper segments to its HTML elements.

On other hand, we can use translateBatch method to translate texts. We will call this method with 3 texts and translators will translate 3 segments as one context. However, i'm not sure all translators will translate 3 texts as one context!

Actually, some translators implement translateBatch method as multiple call translate method, so sentence context will not bound.

We can try to use translateBatch method to translate text segments, but it may not works for some translators implementations, even for google translator. So, if you have any ideas how to implement translation of 3 texts and then split result to 3 segments back, feel free to express your thinks!

mosugi commented 1 year ago

Thanks for your immediate and detailed response. I will describe how my idea and its background.

Relationship Between Block Elements and Inline Elements in Translation

Proposed Solution

The Need to Replace Sentences by Translating at the Segment Level

vitonsky commented 1 year ago

Could you show example how to format string to google and yandex translators will translate it correct and return string with the same format.

My attempt with format <p>She is <em>looking forward</em> to your visit.</p> for yandex translator:

image

As you see, format been broken and we can't parse text back.

Keep in mind that google and yandex translators API supports HTML mode to translate text with HTML tags properly, it is good, but we can't rely on this behavior in other translators. We have to invent algorithm on our side, few segments to one text, translate it, and then be able to parse segments from translation back.