soimort / translate-shell

:speech_balloon: Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
https://www.soimort.org/translate-shell
The Unlicense
6.94k stars 391 forks source link

Usage with persian language #486

Closed andreav closed 1 year ago

andreav commented 1 year ago

Hello, I'm launching a translation from English to Perisan, but if I check the result it is not correct:

Command:

docker run --rm -it soimort/translate-shell -shell -brief -w 10 en:fa Hello

Result:

ﻡﻼﺳ

But this is not "hello"

This is the output using same command with -debug option:

$  docker run --rm -it soimort/translate-shell -shell -brief -debug -w 10 en:fa Hello
>> not found: rlwrap
Translate Shell
(:q to quit)   
  16 bytes > HTTP/1.1 200 OK
  46 bytes > Content-Type: application/json; charset=utf-8
  32 bytes > X-Content-Type-Options: nosniff
  62 bytes > Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  17 bytes > Pragma: no-cache
  39 bytes > Expires: Mon, 01 Jan 1990 00:00:00 GMT
  36 bytes > Date: Fri, 24 Feb 2023 09:56:25 GMT
  80 bytes > Content-Disposition: attachment; filename="json.txt"; filename*=UTF-8''json.txt
  40 bytes > Cross-Origin-Opener-Policy: same-origin
 180 bytes > Accept-CH: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version       
 176 bytes > Content-Security-Policy: script-src 'nonce-lCfSAWqyYqbIV-A7oPOghg' 'unsafe-inline';object-src 'none';base-uri 'self';report-uri /_/TranslateApiHttp/cspreport;worker-src 'self'
 101 bytes > Content-Security-Policy: require-trusted-types-for 'script';report-uri /_/TranslateApiHttp/cspreport
 173 bytes > Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-wow64=*, ch-ua-platform=*, ch-ua-platform-version=*
  43 bytes > Cross-Origin-Resource-Policy: cross-origin
  31 bytes > Access-Control-Allow-Origin: *
  12 bytes > Server: ESF
  20 bytes > X-XSS-Protection: 0
  28 bytes > X-Frame-Options: SAMEORIGIN
  20 bytes > Accept-Ranges: none
  22 bytes > Vary: Accept-Encoding
  18 bytes > Connection: close
  27 bytes > Transfer-Encoding: chunked
   1 bytes > 
   4 bytes > 223
 499 bytes > [[["سلام","Hello",null,null,1]],[["interjection",["سلام!","الو!","هالو"],[["سلام!",["Hello!","Hi!"],null,0.27768996],["الو!",["Hello!"],null,0.0019304542],["هالو",["Hallo!","Halloo!","Hello!"],null,3.627216E-4]],"Hello!",9]],"en",null,null,[["Hello",null,[["سلام",1000,true,false,[1,3],null,[[3]]],["درود",1000,true,false,[1]],["سلام، درود",1000,true,false,[1]],["با سلام",0,true,false,[8]]],[[0,5]],"Hello",0,0]],1,[],[["en"],null,[1],["en"]],null,null,null,null,null,null,null,null,null,[null,2]]
   2 bytes > 0
   1 bytes > 
content = '''
223
[[["سلام","Hello",null,null,1]],[["interjection",["سلام!","الو!","هالو"],[["سلام!",["Hello!","Hi!"],null,0.27768996],["الو!",["Hello!"],null,0.0019304542],["هالو",["Hallo!","Halloo!","Hello!"],null,3.627216E-4]],"Hello!",9]],"en",null,null,[["Hello",null,[["سلام",1000,true,false,[1,3],null,[[3]]],["درود",1000,true,false,[1]],["سلام، درود",1000,true,false,[1]],["با سلام",0,true,false,[8]]],[[0,5]],"Hello",0,0]],1,[],[["en"],null,[1],["en"]],null,null,null,null,null,null,null,null,null,[null,2]]
0

'''
tokens = ["223", "[", "[", "[", "\"سلام\"", ",", "\"Hello\"", ",", "null", ",", "null", ",", "1", "]", "]", ",", "[", "[", "\"interjection\"", ",", "[", "\"سلام!\"", ",", "\"الو!\"", ",", "\"هالو\"", "]", ",", "[", "[", "\"سلام!\"", ",", "[", "\"Hello!\"", ",", "\"Hi!\"", "]", ",", "null", ",", "0.27768996", "]", ",", "[", "\"الو!\"", ",", "[", "\"Hello!\"", "]", ",", "null", ",", "0.0019304542", "]", ",", "[", "\"هالو\"", ",", "[", "\"Hallo!\"", ",", "\"Halloo!\"", ",", "\"Hello!\"", "]", ",", "null", ",", "3.627216E-4", "]", "]", ",", "\"Hello!\"", ",", "9", "]", "]", ",", "\"en\"", ",", 
"null", ",", "null", ",", "[", "[", "\"Hello\"", ",", "null", ",", "[", "[", "\"سلام\"", ",", "1000", ",", "true", ",", "false", ",", "[", "1", ",", "3", "]", ",", "null", ",", "[", "[", "3", "]", "]", "]", ",", "[", "\"درود\"", ",", "1000", ",", "true", ",", "false", ",", "[", "1", "]", "]", ",", "[", "\"سلام، درود\"", ",", "1000", ",", "true", ",", "false", ",", "[", "1", "]", "]", ",", "[", "\"با سلام\"", ",", "0", ",", "true", ",", "false", ",", "[", "8", "]", "]", "]", ",", "[", "[", "0", ",", "5", "]", "]", ",", "\"Hello\"", ",", "0", ",", "0", "]", "]", ",", "1", ",", "[", "]", ",", 
"[", "[", "\"en\"", "]", ",", "null", ",", "[", "1", "]", ",", "[", "\"en\"", "]", "]", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "[", "null", ",", "2", "]", "]", "0"]
ast = {
"0,0,0,0"       "\"سلام\""
"0,0,0,1"       "\"Hello\""
"0,0,0,2"       "null"
"0,0,0,3"       "null"
"0,0,0,4"       "1"
"0,1,0,0"       "\"interjection\""
"0,1,0,1,0"     "\"سلام!\""
"0,1,0,1,1"     "\"الو!\""
"0,1,0,1,2"     "\"هالو\""
"0,1,0,2,0,0"   "\"سلام!\""
"0,1,0,2,0,1,0" "\"Hello!\""
"0,1,0,2,0,1,1" "\"Hi!\""
"0,1,0,2,0,2"   "null"
"0,1,0,2,0,3"   "0.27768996"
"0,1,0,2,1,0"   "\"الو!\""
"0,1,0,2,1,1,0" "\"Hello!\""
"0,1,0,2,1,2"   "null"
"0,1,0,2,1,3"   "0.0019304542"
"0,1,0,2,2,0"   "\"هالو\""
"0,1,0,2,2,1,0" "\"Hallo!\""
"0,1,0,2,2,1,1" "\"Halloo!\""
"0,1,0,2,2,1,2" "\"Hello!\""
"0,1,0,2,2,2"   "null"
"0,1,0,2,2,3"   "3.627216E-4"
"0,1,0,3"       "\"Hello!\""
"0,1,0,4"       "9"
"0,2"   "\"en\""
"0,3"   "null"
"0,4"   "null"
"0,5,0,0"       "\"Hello\""
"0,5,0,1"       "null"
"0,5,0,2,0,0"   "\"سلام\""
"0,5,0,2,0,1"   "1000"
"0,5,0,2,0,2"   "true"
"0,5,0,2,0,3"   "false"
"0,5,0,2,0,4,0" "1"
"0,5,0,2,0,4,1" "3"
"0,5,0,2,0,5"   "null"
"0,5,0,2,0,6,0,0"       "3"
"0,5,0,2,1,0"   "\"درود\""
"0,5,0,2,1,1"   "1000"
"0,5,0,2,1,2"   "true"
"0,5,0,2,1,3"   "false"
"0,5,0,2,1,4,0" "1"
"0,5,0,2,2,0"   "\"سلام، درود\""
"0,5,0,2,2,1"   "1000"
"0,5,0,2,2,2"   "true"
"0,5,0,2,2,3"   "false"
"0,5,0,2,2,4,0" "1"
"0,5,0,2,3,0"   "\"با سلام\""
"0,5,0,2,3,1"   "0"
"0,5,0,2,3,2"   "true"
"0,5,0,2,3,3"   "false"
"0,5,0,2,3,4,0" "8"
"0,5,0,3,0,0"   "0"
"0,5,0,3,0,1"   "5"
"0,5,0,4"       "\"Hello\""
"0,5,0,5"       "0"
"0,5,0,6"       "0"
"0,6"   "1"
"0,8,0,0"       "\"en\""
"0,8,1" "null"
"0,8,2,0"       "1"
"0,8,3,0"       "\"en\""
"0,9"   "null"
"0,10"  "null"
"0,11"  "null"
"0,12"  "null"
"0,13"  "null"
"0,14"  "null"
"0,15"  "null"
"0,16"  "null"
"0,17"  "null"
"0,18,0"        "null"
"0,18,1"        "2"
"0"     "0"
}
       ﻡﻼﺳ

And this is the output of -V command

$  docker run --rm -it soimort/translate-shell -shell -V
Translate Shell       0.9.6.11-release

platform              Linux
terminal type         xterm
bi-di emulator        [N/A]
gawk (GNU Awk)        5.0.1
fribidi (GNU FriBidi) 1.0.8
audio player          mplayer
terminal pager        less
web browser           xdg-open
user locale           en_US.UTF-8 (English)
home language         en
source language       auto
target language       en
translation engine    google
proxy                 [NONE]
user-agent            Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) Version/8.0 Safari/602.1 Epiphany/3.18.2
ip version            [DEFAULT]
theme                 default
init file             [NONE]

Report bugs to:       https://github.com/soimort/translate-shell/issues

Am I doing something wrong?

Thank you

soimort commented 1 year ago

This is because translate-shell invokes FriBiDi for displaying right-to-left text by default. If your terminal emulator already supports FriBiDi, the text will be reversed twice, i.e., you'll see the original "left to right" version of the text which is not displayed correctly.

It is hard to tell whether a VTE was compiled with FriBiDi enabled, that's why reversing the text is still the default behavior. (related: #464)

For now you can avoid this behavior by adding the --no-bidi option.

andreav commented 1 year ago

Fantastic! It is indeed working now with --no-bidi option, thank you!

I have another question but I can open a separate issue if you prefer.

Sometimes translations get unicode text inside them (character u200C, I report an example.

Do you think is there any solution to avoid seeing it? Or should I clean up the text after receiving the response?

Thank you!

$ docker run --rm -it soimort/translate-shell --no-bidi -brief -debug en:fa --debug "With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe"

  16 bytes > HTTP/1.1 200 OK
  46 bytes > Content-Type: application/json; charset=utf-8
  32 bytes > X-Content-Type-Options: nosniff
  62 bytes > Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  17 bytes > Pragma: no-cache
  39 bytes > Expires: Mon, 01 Jan 1990 00:00:00 GMT
  36 bytes > Date: Fri, 24 Feb 2023 13:19:26 GMT
  80 bytes > Content-Disposition: attachment; filename="json.txt"; filename*=UTF-8''json.txt
  43 bytes > Cross-Origin-Resource-Policy: cross-origin
 173 bytes > Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-wow64=*, ch-ua-platform=*, ch-ua-platform-version=*
 176 bytes > Content-Security-Policy: script-src 'nonce-7DTj6nINJHsRNcfoxykpWQ' 'unsafe-inline';object-src 'none';base-uri 'self';report-uri /_/TranslateApiHttp/cspreport;worker-src 'self'
 101 bytes > Content-Security-Policy: require-trusted-types-for 'script';report-uri /_/TranslateApiHttp/cspreport
 180 bytes > Accept-CH: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version       
 149 bytes > Report-To: {"group":"TranslateApiHttp","max_age":2592000,"endpoints":[{"url":"https://csp.withgoogle.com/csp/report-to/TranslateApiHttp/external"}]}
  70 bytes > Cross-Origin-Opener-Policy: same-origin; report-to="TranslateApiHttp"
  31 bytes > Access-Control-Allow-Origin: *
  12 bytes > Server: ESF
  20 bytes > X-XSS-Protection: 0
  28 bytes > X-Frame-Options: SAMEORIGIN
  20 bytes > Accept-Ranges: none
  22 bytes > Vary: Accept-Encoding
  18 bytes > Connection: close
  27 bytes > Transfer-Encoding: chunked
   1 bytes > 
   4 bytes > 535
1099 bytes > [[["با FiftyFifty می\u200cتوانید فرهنگ\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\u200cای از جهان ارتباط بالقوه پیدا کنید.","With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe",null,null,3,null,null,[[]],[[["982c75c78c6c8e6005ec3a4021a7f785","tea_GrecoIndoEuropeA_en2elfahykakumksq_2021q3.md"]]]]],null,"en",null,null,[["With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe",null,[["با FiftyFifty می\u200cتوانید فرهنگ\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\u200cای از جهان ارتباط بالقوه پیدا کنید.",0,true,false,[3],null,[[3]]],["با FiftyFifty می\u200cتوانید فرهنگ\u200cهای جدید را کاوش و کشف کنید و با مردم از هر گوشه\u200cای از دنیا ارتباط بالقوه پیدا کنید.",0,true,false,[8]]],[[0,131]],"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe",0,0]],1,[],[["en"],null,[1],["en"]],null,null,null,null,null,null,null,null,null,[null,2]]
   2 bytes > 0
   1 bytes > 
content = '''
535
[[["با FiftyFifty می\u200cتوانید فرهنگ\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\u200cای از جهان ارتباط بالقوه پیدا کنید.","With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe",null,null,3,null,null,[[]],[[["982c75c78c6c8e6005ec3a4021a7f785","tea_GrecoIndoEuropeA_en2elfahykakumksq_2021q3.md"]]]]],null,"en",null,null,[["With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe",null,[["با FiftyFifty می\u200cتوانید فرهنگ\u200cهای 
جدید را کشف و کشف کنید و با مردم از هر گوشه\u200cای از جهان ارتباط بالقوه پیدا کنید.",0,true,false,[3],null,[[3]]],["با FiftyFifty می\u200cتوانید فرهنگ\u200cهای جدید را کاوش و کشف کنید و با مردم از هر گوشه\u200cای از دنیا ارتباط بالقوه پیدا کنید.",0,true,false,[8]]],[[0,131]],"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe",0,0]],1,[],[["en"],null,[1],["en"]],null,null,null,null,null,null,null,null,null,[null,2]]
0

'''
tokens = ["535", "[", "[", "[", "\"با FiftyFifty می\\u200cتوانید فرهنگ\\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\\u200cای از جهان ارتباط بالقوه پیدا کنید.\"", ",", "\"With FiftyFifty you 
can explore and discover new cultures and find potential connections with people from every corner of the globe\"", ",", "null", ",", "null", ",", "3", ",", "null", ",", "null", ",", "[", "[", "]", "]", ",", "[", "[", "[", "\"982c75c78c6c8e6005ec3a4021a7f785\"", ",", "\"tea_GrecoIndoEuropeA_en2elfahykakumksq_2021q3.md\"", "]", "]", "]", "]", "]", ",", "null", ",", "\"en\"", ",", "null", ",", "null", ",", "[", "[", "\"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe\"", ",", "null", ",", "[", "[", "\"با FiftyFifty می\\u200cتوانید فرهنگ\\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\\u200cای از جهان ارتباط بالقوه پیدا کنید.\"", ",", "0", ",", "true", ",", "false", ",", "[", "3", "]", ",", "null", ",", "[", "[", "3", "]", "]", "]", ",", "[", "\"با FiftyFifty می\\u200cتوانید فرهنگ\\u200cهای جدید را کاوش و کشف کنید و با مردم از هر گوشه\\u200cای از دنیا ارتباط بالقوه پیدا کنید.\"", ",", "0", ",", "true", ",", "false", ",", "[", "8", "]", "]", "]", ",", "[", "[", "0", ",", "131", "]", "]", ",", "\"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe\"", ",", "0", ",", "0", "]", "]", ",", "1", ",", "[", "]", ",", "[", "[", "\"en\"", "]", ",", "null", ",", "[", "1", "]", ",", "[", "\"en\"", "]", "]", ",", "null", 
",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "null", ",", "[", "null", ",", "2", "]", "]", "0"]
ast = {
"0,0,0,0"       "\"با FiftyFifty می\\u200cتوانید فرهنگ\\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\\u200cای از جهان ارتباط بالقوه پیدا کنید.\""
"0,0,0,1"       "\"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe\""
"0,0,0,2"       "null"
"0,0,0,3"       "null"
"0,0,0,4"       "3"
"0,0,0,5"       "null"
"0,0,0,6"       "null"
"0,0,0,8,0,0,0" "\"982c75c78c6c8e6005ec3a4021a7f785\""
"0,0,0,8,0,0,1" "\"tea_GrecoIndoEuropeA_en2elfahykakumksq_2021q3.md\""
"0,1"   "null"
"0,2"   "\"en\""
"0,3"   "null"
"0,4"   "null"
"0,5,0,0"       "\"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe\""
"0,5,0,1"       "null"
"0,5,0,2,0,0"   "\"با FiftyFifty می\\u200cتوانید فرهنگ\\u200cهای جدید را کشف و کشف کنید و با مردم از هر گوشه\\u200cای از جهان ارتباط بالقوه پیدا کنید.\""
"0,5,0,2,0,1"   "0"
"0,5,0,2,0,2"   "true"
"0,5,0,2,0,3"   "false"
"0,5,0,2,0,4,0" "3"
"0,5,0,2,0,5"   "null"
"0,5,0,2,0,6,0,0"       "3"
"0,5,0,2,1,0"   "\"با FiftyFifty می\\u200cتوانید فرهنگ\\u200cهای جدید را کاوش و کشف کنید و با مردم از هر گوشه\\u200cای از دنیا ارتباط بالقوه پیدا کنید.\""
"0,5,0,2,1,1"   "0"
"0,5,0,2,1,2"   "true"
"0,5,0,2,1,3"   "false"
"0,5,0,2,1,4,0" "8"
"0,5,0,3,0,0"   "0"
"0,5,0,3,0,1"   "131"
"0,5,0,4"       "\"With FiftyFifty you can explore and discover new cultures and find potential connections with people from every corner of the globe\""
"0,5,0,5"       "0"
"0,5,0,6"       "0"
"0,6"   "1"
"0,8,0,0"       "\"en\""
"0,8,1" "null"
"0,8,2,0"       "1"
"0,8,3,0"       "\"en\""
"0,9"   "null"
"0,10"  "null"
"0,11"  "null"
"0,12"  "null"
"0,13"  "null"
"0,14"  "null"
"0,15"  "null"
"0,16"  "null"
"0,17"  "null"
"0,18,0"        "null"
"0,18,1"        "2"
"0"     "0"
}
با FiftyFifty میu200cتوانید فرهنگu200cهای جدید را کشف و کشف کنید و با مردم از هر گوشهu200cای از جهان ارتباط بالقوه پیدا کنید.
andreav commented 1 year ago

I just removed all u200c occurrences and the translations are ok!

Thank you for this great project!

soimort commented 1 year ago

Thanks for reporting the u200c occurrences. It's a zero-width non-joiner that should not be actually printed in the final text.

This is fixed in the develop branch now.