yunsukim86 / wbw-lm

Context-aware beam search for unsupervised word-by-word translation
Other
9 stars 1 forks source link

Context-aware Beam Search for Unsupervised Word-by-Word Translation

This code implements a simple beam search where cross-lingual word embedding is combined with a language model. It is compatible with MUSE embeddings and kenlm language models. The output translation can be further fed to a denoising autoencoder for improved reordering.

If you use this code, please cite:

If you are looking for the denoising autoencoder, please go to sockeye-noise.

Installation

First, please install all dependencies:

Then clone this repository.

Usage

Here is a simple example for translation:

> cat {input_corpus} | python translate.py --src_emb {source_embedding} \
                                           --tgt_emb {target_embedding} \
                                           --emb_dim {embedding_dimension} \
                                           --lm {language_model} > {output_translation}

Please refer to help message (-h) for other detailed options.