modernice / dragoman

Translator for structured documents.
MIT License
2 stars 3 forks source link
deepl document dragoman go golang i18n json placeholders structured translate

Dragoman - Translator for Structured Documents

PkgGoDev Test

Dragoman is an AI-powered tool for translating structured documents like JSON, XML, YAML. The tool's key feature is its ability to maintain the document's structure during translation - keeping elements such as JSON keys and placeholders intact.

Dragoman is available as both a CLI tool and a Go library. This means you can use it directly from your terminal for one-off tasks, or integrate it into your Go applications for more complex use cases.

If you're looking for a version of Dragoman that leverages conventional translation services like Google Translate or DeepL, check out the freeze branch of this repository. The previous implementation manually extracted texts from the input files, translated them using DeepL or Google Translate, and reinserted the translated pieces back into the original documents.

Installation

Dragoman can be installed directly using Go's built-in package manager:

go install github.com/modernice/dragoman/cmd/dragoman@latest

To add Dragoman to your Go project, install using go get:

go get github.com/modernice/dragoman

Usage

The basic usage of Dragoman is as follows:

dragoman source.json

This command will translate the content of source.json to English and print the translated document to stdout. The source language is automatically detected by default, but if you want to specify the source or target languages, you need to use the --from or --to option.

Full list of available options

-f or --from

The source language of the document. It can be specified in any format that a human would understand (like 'English', 'German', 'French', etc.). If not provided, it defaults to 'auto', meaning the language is automatically detected.

dragoman translate source.json --from English

-t or --to

The target language to which the document will be translated. It can be specified in any format that a human would understand (like 'English', 'German', 'French', etc.). If not provided, it defaults to 'English'.

dragoman translate source.json --to French

-o or --out

The path to the output file where the translated content will be saved. If this option is not provided, the translated content will be printed to stdout.

dragoman translate source.json --out target.json

--split-chunks

Split the source document into chunks before translating. This can help to fit the documents into the context size of OpenAI's models. Each line that starts with one of the provided prefixes will create a new chunk.

Example: Split a Markdown file into chunks when encountering H2 and H3 headings:

dragoman translate source.json --split-chunks "## " --split-chunks "### "

-u or --update

Enable this option to only translate missing fields from the source file that are missing in the output file. This option requires the source and output files to be JSON!

dragoman translate source.json --out target.json --update

Example

When you add new translations to your JSON source file, you can use the --update option to only translate the newly added fields and merge them into the output file.

// en.json
{
    "hello": "Hello, world!",
    "contact": {
        "email": "hello@example.com",
        "response": "Thank you for your message."
    }
}
// de.json
{
    "hello": "Hallo, Welt!",
    "contact": {
        "email": "hallo@example.com"
    }
}
dragoman translate en.json --out de.json --update

Result:

// de.json
{
    "hello": "Hallo, Welt!",
    "contact": {
        "email": "hallo@example.com",
        "response": "Vielen Dank für deine Nachricht."
    }
}

-p or --preserve

This option allows you to specify a list of specific words or phrases, separated by commas, that you want to remain unchanged during the translation process. It's particularly useful for ensuring that certain terms, which may have significance in their original form or are used in specific contexts (like code, trademarks, or names), are not altered. These specified terms will be recognized and preserved whether they appear in isolation or as part of larger strings. This feature is especially handy for content that includes embedded terms within other elements, such as HTML tags. For instance, using --preserve ensures that a term like Dragoman retains its original form post-translation. Note that the effectiveness of this feature may vary depending on the language model used, and it is optimized for use with OpenAI's GPT models.

dragoman translate source.json --preserve Dragoman

-v or --verbose

A flag that, if provided, makes the CLI provide more detailed output about the process and result of the translation.

dragoman translate source.json --verbose

-h or --help

A flag that displays a help message detailing how to use the command and its options.

dragoman --help

Use as Library

Besides the CLI tool, Dragoman can also be used as a Go library in your own applications. This allows you to build the Dragoman translation capabilities directly into your own Go programs.

Example: Basic Translation

In this example, we load a JSON file and translate its content using the default source and target languages (automatic detection and English, respectively).

package main

import (
    "fmt"
    "io"

    "github.com/modernice/dragoman"
    "github.com/modernice/dragoman/openai"
)

func main() {
    content, _ := io.ReadFile("source.json")

    service := openai.New()
    translator := dragoman.New(service)

    translated, _ := translator.Translate(context.TODO(), string(content))

    fmt.Println(translated)
}

Example: Translation with Preserved Words

In this example, we translate a JSON file, specifying some preserved words that should not be translated.

package main

import (
    "fmt"
    "io"

    "github.com/modernice/dragoman"
    "github.com/modernice/dragoman/openai"
)

func main() {
    content, _ := io.ReadFile("source.json")

    service := openai.New()
    translator := dragoman.New(service)

    translated, _ := translator.Translate(
        context.TODO(),
        string(content),
        dragoman.Preserve([]string{"Dragoman", "OpenAI"}),
    )

    fmt.Println(translated)
}

Example: Translation with Specific Source and Target Languages

In this example, we translate a JSON file from English to French, specifying the source and target languages.

package main

import (
    "fmt"
    "io"

    "github.com/modernice/dragoman"
    "github.com/modernice/dragoman/openai"
)

func main() {
    content, _ := io.ReadFile("source.json")

    service := openai.New()
    translator := dragoman.New(service)

    translated, _ := translator.Translate(
        context.TODO(),
        string(content),
        dragoman.Source("English"),
        dragoman.Target("French"),
    )

    fmt.Println(translated)
}

License

MIT