leolivier / auto-po-lyglot

The goal of this project is to use various LLMs to help translate po files using a first already translated file.
https://github.com/leolivier/auto-po-lyglot
MIT License
0 stars 0 forks source link
automation llms poet translation

ToC

Goal of this project

The goal of this project is to use various LLMs to help translate po files using a first already translated file.

For example, you have a .po file with msgids in English and msgstrs in French: using this file, you can ask the tool to translate the .po file into any other language. The first translation helps to disambiguate the very short sentences or parts of sentences that are usually found in .po files.

If you have an API key for the commercial LLMs, auto-po-lyglot can work with OpenAI, Anthropic Claude, Gemini and Grok. Notes:

  1. Grok is implemented but not tested yet as the Grok API is not yet available in my country.
  2. Claude is implemented in 2 flavors: cached (beta version on Anthropic) or non cached. The cached version uses a longer system prompt because caching only works if the system prompt is more than 1024 tokens long. The big advantage is that the cost of the cached version is much cheaper than the non-cached one. It also works with Ollama: You can run your Ollama server locally and be able to use any model that Ollama can run - depending on your hardware capabilities, of course and for free!.

Install

Prerequisite

Install from PyPi

Install from sources

  1. Fork the repo:
    git clone https://github.com/leolivier/transpo.git auto_po_lyglot
  2. cd to the auto_po_lyglot folder and install the package and its dependencies:
    cd auto_po_lyglot && pip install .

Configuration

auto_po_lyglot uses a mix of command line arguments and variables in a .env file to be as flexible as possible;

Most parameters can be given directly on the command line (if you don't use the UI version), but you can put all the parameters that don't change very often in a .env file and use the command line only to override their values when needed.

The .env file

The .env file can be created by copying the .env.example file to .env:

cp .env.example .env

Then edit the .env file to suit your needs. Specifically:

Only for the UI

Run it:

Running with the UI

From version 1.3.0

Running the UI from the command line

After installing auto_po_lyglot with pip, create a short Python script called auto_po_lyglot_ui.py that contains these 2 lines:

from auto_po_lyglot.po_streamlit import streamlit_main
streamlit_main()

And run streamlit run auto_po_lyglot_ui.py Then, you can go to http://localhost:8501 and provide the necessary parameters. Most of them can be initialized based on:

  • the content of the .env file as described above
  • the command line parameters as described below, after a '--' special param that tells streamlit that the following parameters are for auto-po-lyglot e.g.:
    streamlit run auto_po_lyglot_ui.py -- -l ollama -m phi3 -t 0.5

    Note: The -o and -p parameters are ignored.

In the UI, a help button (with a '?') explains what parameters to provide where.

Running from the Command Line

Usage: auto_po_lyglot [-h] [-p] [-l LLM] [-m MODEL] [-t TEMPERATURE] [--original_language ORIGINAL_LANGUAGE] [--context_language CONTEXT_LANGUAGE] [--target_language TARGET_LANGUAGE] [-i INPUT_PO] [-o OUTPUT_PO] [-v] [-vv]

option can be used to supersedes variable in the .env file default value
-h, --help show this help message and exit
-v, --verbose verbose mode LOG_LEVEL=INFO LOG_LEVEL=WARN
-vv, --debug debug mode LOG_LEVEL=DEBUG LOG_LEVEL=WARN
-p, --show_prompts show the prompts used for translation and exits
-i, --input_po INPUT_PO the .po file containing the msgids (phrases to be translated) and msgstrs (context translations) INPUT_PO
-o, --output_po OUTPUT_PO the .po file where the translated results will be written. If not provided, it will be created in the same directory as the input_po except if the input po file has the specific format .../locale//LC_MESSAGES/\. In this case, the output po file will be created as .../locale/\/LC_MESSAGES/\. OUTPUT_PO see doc
-l, --llm LLM Le type of LLM you want to use. Can be openai, ollama, claude or claude_cached. For openai or claude[_cached], you need to set the proper api key in the environment or in the .env file LLM_CLIENT ollama
-m, --model MODEL the name of the model to use. If not provided, a default model will be used, based on the chosen client LLM_MODEL see doc
-t, --temperature TEMPERATURE the temperature of the model. If not provided at all, a default value of 0.2 will be used TEMPERATURE 0.2
--original_language ORIGINAL_LANGUAGE the language of the original phrase ORIGINAL_LANGUAGE
--context_language CONTEXT_LANGUAGE the language of the context translation CONTEXT_LANGUAGE
--target_language TARGET_LANGUAGE the language into which the original phrase will be translated TARGET_LANGUAGES (which is an array)

Using Docker

From version 1.4.0

You can run auto_po_lyglot via Docker. A pre-built up-to-date image can be used at ghcr.io/leolivier/auto_po_lyglot or you can build yours.

Create Docker image

If you want to create your own Docker image, create a folder and cd to it then:

Running the docker image

if you built the image yourself, run:

docker run -p 8501:8501 -v ./.env:/app/.env --name auto_po_lyglot auto_po_lyglot:latest

If you want to use the pre-built image, run:

docker run -p 8501:8501 -v ./.env:/app/.env --name auto_po_lyglot ghcr.io/leolivier/auto_po_lyglot:latest