soimort / translate-shell

:speech_balloon: Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
https://www.soimort.org/translate-shell
The Unlicense
6.94k stars 391 forks source link

Inconsistent inclusion of nikud in Hebrew results #509

Closed NeatNit closed 8 months ago

NeatNit commented 8 months ago

Not sure if this is a bug in this tool or in something more upstream, but I'm seeing inconsistent inclusion of nikud - Hebrew phonetic notation (also spelt niqqud, nikkud - for future searches to find this issue) when translating words into Hebrew:

Screenshot_20240113_160443_Termux

~ $ trans -b -no-bidi en:he hello שלום ~ $ trans -b -no-bidi en:he more יותר ~ $ trans -b -no-bidi en:he less פָּחוֹת ~ $ trans -b -no-bidi en:he lesser קָטָן יוֹתֵר ~ $ trans -b -no-bidi en:he indeed אכן ~ $ trans -b -no-bidi en:he element אֵלֵמֶנט ~ $ trans -b -no-bidi en:he opposite מול ~ $ trans -b -no-bidi en:he above מֵעַל

Seemingly at random, some results include nikud and some do not. For example "less" translates with nikud, "more" without.

I noticed this bit in the docs, which I originally thought was relevant and indicative that this is a bug:

In brief mode, phonetic notation (if any) is not shown by default. To enable this, put an at sign “@” in front of the language code

But as I type this and try the listed example with and without the flag, I realise it's something completely different and not related to the target language's superfluous notation.

Either way though: for consistent output, I think it should always show a translation without nikud (nikud is extremely rare in everyday life, but always appears in dictionaries)

I noticed that the full output does show good options without nikud:

Screenshot_20240113_163318_Termux

NeatNit commented 8 months ago

Version info:

Translate Shell       0.9.7.1

platform              Linux
terminal type         xterm-256color
bi-di emulator        [N/A]
gawk (GNU Awk)        5.3.0
fribidi (GNU FriBidi) 1.0.13
audio player          mpv --no-config
terminal pager        less
web browser           xdg-open
user locale           en_US.UTF-8 (English)
host language         en
source language       auto
target language       en
translation engine    auto
proxy                 [NONE]
user-agent            Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 Edg/104.0.1293.54
ip version            [DEFAULT]
theme                 default
init file             [NONE]

running in Termux on Android

soimort commented 8 months ago

As far as I'm aware, the output of trans is consistent with Google Translate (https://translate.google.com/), which does include nikud (most of the time).

Screenshot from 2024-01-15 17-04-10 Screenshot from 2024-01-15 17-05-48

As trans is just a command-line interface which is mostly language-agnostic, we can't fix this on our part, unless Google's API provides both nikud-marked text and regular text (which is not the case so far).

If you want translation without nikud then I suggest using Bing as the engine:

Screenshot from 2024-01-15 17-16-57