soimort / translate-shell

:speech_balloon: Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
https://www.soimort.org/translate-shell
The Unlicense
6.83k stars 387 forks source link

Garbled html in Korean translations #490

Closed jmgomezpoveda closed 1 year ago

jmgomezpoveda commented 1 year ago

A file in English containing some html code gets garbled when translating to Korean.

<a href="showitem.php">Show</a> some text that goes here

output:

"\u003ca href\u003dshowitem.php\u003e표시\u003c/a\u003e 여기에 들어가는 일부 텍스트"

This does not occur in the Google Translate web interface:

여기에 들어가는 일부 텍스트 <a href="showitem.php">표시</a>

This is working fine when translating to other languages; only Korean seems affected among the languages I have tested.

soimort commented 1 year ago

What version of gawk are you running? (Is it 5.1.x?)

If your trans is already up-to-date then I suggest updating to gawk 5.2.1 and try again, for I noticed a bug when pattern-matching Unicode strings (specifically for those including Korean characters) in gawk 5.1, which seems to be fixed in newer versions.

jmgomezpoveda commented 1 year ago

That is correct! I was using gawk 5.1.0. I've just upgraded to 5.2.1 as per your advice, and with the latest version of trans the translation looks great now.

soimort commented 1 year ago

Very good. I'm closing this issue as fixed.