thai2nmt: English-Thai Machine Translation Models

This repository includes code to reproduce our experiments on Thai-English NMT models and scripts to download the datasets (scb-mt-en-th-2020, mt-opus and scb-mt-en-th-2020+mt-opus) along with the train/validation/test split that we used in the experiments.

Our experiments are listed below.

Experiment #1 TBASE.SCB-1M -- Transformer BASE models trained on scb-mt-en-th-2020 v1.0
Experiment #2 TBASE.MT-OPUS -- Transformer BASE models trained on English-Thai datasets listed in Open Parallel Corpus (OPUS)
Experiment #3 TBASE.SCB-1M+MT-OPUS -- Transformer BASE models trained on English-Thai scb-mt-en-th-2020 v1.0 and datasets listed in Open Parallel Corpus (OPUS)

BibTeX entry and citation info

@Article{Lowphansirikul2021,
    author={Lowphansirikul, Lalita
            and Polpanumas, Charin
            and Rutherford, Attapol T.
            and Nutanong, Sarana},
    title={A large English--Thai parallel corpus from the web and machine-generated text},
    journal={Language Resources and Evaluation},
    year={2021},
    month={Mar},
    day={30},
    issn={1574-0218},
    doi={10.1007/s10579-021-09536-6},
    url={https://doi.org/10.1007/s10579-021-09536-6}

vistec-AI / thai2nmt

readme

thai2nmt: English-Thai Machine Translation Models