soimort / translate-shell

:speech_balloon: Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
https://www.soimort.org/translate-shell
The Unlicense
6.95k stars 392 forks source link

Double suggestion: Exclude words and/or Markdown support #214

Open ohnonot opened 6 years ago

ohnonot commented 6 years ago

Translating a README for a little project of my own, both google and yandex insist on translating words that should not be translated, like commands or configuration variables.

In some cases an uncoditional exclusion of all occurences of a word is the right thing.

Instead of or in addition to simply excluding words, a more elegant solution might be Markdown support: never translate anything that is in code tags or code blocks.

soimort commented 6 years ago

On the 1st suggestion: Exclusion of words is a duplicate of #75.

On the 2nd one: I believe that full Markdown support is a bit beyond the scope of this project. It's not just about parsing code tags / blocks; any special characters need to be preserved for sanity: for example, the English string *test* may translate to Test * * in Latin, which diverges undesirably from its original Markdown.

None of the existing translation services supports any other format than plain text; hence I would have to implement a Markdown parser (Note that there are many dialects of Markdown so it's nontrivial work!) Furthermore, if Markdown is added, one could even request support for org-mode / AsciiDoc / HTML / XML / TeX ...

I agree that handling specific document formats could be a useful feature. It has probably less priority for me, though.

ohnonot commented 6 years ago

On the 1st suggestion: Exclusion of words is a duplicate of #75.

do i understand correctly that this is currently not implemented in trans?
it's good to know that google translate has at least some option for this, but i think it would be more elegant to have it handled locally.

On the 2nd one: I believe that full Markdown support is a bit beyond the scope of this project.

that's fine with me.
markdown was just my first thought about the first one.

anyhow, thanks for another great piece of FOSS!

soimort commented 6 years ago

do i understand correctly that this is currently not implemented in trans?

Correct, there is no such device in trans yet.

I'm afraid to say, that Google Translate's special support is a necessity here, as this demand cannot be handled ideally on local side. Given a sentence ABC (where B is intended to be kept untranslated), we could either: (1) Ask the engine to translate A and C separately, but then the ordering of three parts would require prior knowledge of the target language's grammar, and the sentence as a whole can look less sensible; (2) Translate ABC at once, then replace the translated B with the untranslated B back. This is not always feasible since the translation is neither deterministic nor injective.

Since trans is not an NLP software, the only reasonable way of implementing this would be wrapping stuff in Google's special HTML tags (as described in #75), and it will be Google-only. It's not going to be generalizable to other translation engines. (though I can assume most people using trans will prefer Google Translate as a default engine, it's not always the case.)

ohnonot commented 6 years ago

I played with this a little, and it seems that all translation engines have a way of ignoring words (couldn't test all though).
Nevertheless, it would be useful to have a global solution.

Either putting them in double or single or curly quotes, or maybe writing them ALLCAPS, etc. etc., I can think of many ways to implement it locally (maybe differently for each engine), and reasonably.

No pressure. I will try to put together some examples.
Unfortunately my awk/gawk prowess is close to nil, but surely it can't be hard to replace a fixed string both ways (before and after sending it to the engine), without splitting the whole text.

Jieiku commented 1 month ago

deepl is able to translate markdown and it does it without messing up markdown or html tags. Deepl allows a limited amount of free translations per month, after which you have to pay for it, so I have been looking for a way to do this with google translate. If you only need a limited amount of translation here is how I am currently doing it with Deepl:

git clone https://github.com/mgruner/deepl-api-rs
cargo run
sudo cp target/release/deepl /usr/local/bin/
which deepl
export DEEPL_API_KEY="this-is-my-key-get-your-own:aa"
echo "Recent Posts" | deepl translate --source-language EN --target-language FR | cat -
cat /home/jieiku/.dev/abridge/content/overview-markdown-and-style.md | deepl translate --source-language EN --target-language ES | cat -