hemingkx / ChineseNMT

ChineseNMT: Translate English to Chinese with PyTorch Implementation of Transformer
448 stars 90 forks source link
neural-machine-translation pytorch transformer

Language: 简体中文 | English

ChineseNMT

基于transformer的英译中翻译模型🤗。

项目说明参考知乎文章:教你用PyTorch玩转Transformer英译中翻译模型!

Data

The dataset is from WMT 2018 Chinese-English track (Only NEWS Area)

Data Process

分词

Model

采用Harvard开源的 transformer-pytorch ,中文说明可参考 传送门

Requirements

This repo was tested on Python 3.6+ and PyTorch 1.5.1. The main requirements are:

To get the environment settled quickly, run:

pip install -r requirements.txt

Usage

模型参数在config.py中设置。

如要运行模型,可在命令行输入:

python main.py

实验结果在./experiment/train.log文件中,测试集翻译结果在./experiment/output.txt中。

在两块GeForce GTX 1080 Ti上运行,每个epoch用时一小时左右。

Results

Model NoamOpt LabelSmoothing Best Dev Bleu Test Bleu
1 No No 24.07 24.03
2 Yes No 26.08 25.94
3 No Yes 23.92 23.84

Pretrained Model

训练好的 Model 2 模型(当前最优模型)可以在如下链接直接下载😊:

链接: https://pan.baidu.com/s/1RKC-HV_UmXHq-sy1-yZd2Q 密码: g9wl

Beam Search

当前最优模型(Model 2)使用beam search测试的结果

Beam_size 2 3 4 5
Test Bleu 26.59 26.80 26.84 26.86

One Sentence Translation

将训练好的model或者上述Pretrained model以model.pth命名,保存在./experiment路径下。在main.py中运行translate_example,即可实现单句翻译。

如英文输入单句为:

The near-term policy remedies are clear: raise the minimum wage to a level that will keep a fully employed worker and his or her family out of poverty, and extend the earned-income tax credit to childless workers.

ground truth为:

近期的政策对策很明确:把最低工资提升到足以一个全职工人及其家庭免于贫困的水平,扩大对无子女劳动者的工资所得税减免。

beam size = 3的翻译结果为:

短期政策方案很清楚:把最低工资提高到充分就业的水平,并扩大向无薪工人发放所得的税收信用。

Mention

The codes released in this reposity are only tested successfully with Linux. If you wanna try it with Windows, steps below may be useful to you as mentioned in issue 2:

  1. adding utf-8 encoding declaration:

    in lines 16 and 19 of get_corpus.py:

    with open(ch_path, "w", encoding="utf-8") as fch:
    with open(en_path, "w", encoding="utf-8") as fen:

    in line 165 of train.py:

    with open(config.output_path, "w", encoding="utf-8") as fp:
  2. using conda command to install sacrebleu if Anoconda is used for building your virtual env:

    conda install -c conda-forge sacrebleu

For any other problems you meet when doing your own project, welcome to issuing or sending emails to me 😊~