This code implements a simple beam search where cross-lingual word embedding is combined with a language model. It is compatible with MUSE embeddings and kenlm language models. The output translation can be further fed to a denoising autoencoder for improved reordering.
If you use this code, please cite:
If you are looking for the denoising autoencoder, please go to sockeye-noise.
First, please install all dependencies:
Then clone this repository.
Here is a simple example for translation:
> cat {input_corpus} | python translate.py --src_emb {source_embedding} \
--tgt_emb {target_embedding} \
--emb_dim {embedding_dimension} \
--lm {language_model} > {output_translation}
Please refer to help message (-h
) for other detailed options.