We present the first attempt at using sequence to sequence neural networks to model text simplification (TS). Unlike the previously proposed automated methods, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction. An extensive human evaluation of the output has shown that NTS systems achieve good grammaticality and meaning preservation of output sentences and higher level of simplification than the state-of-the-art automated TS systems. We train our models on the Wikipedia corpus containing good and good partial alignments.
@InProceedings{neural-text-simplification,
author = {Sergiu Nisioi and Sanja Štajner and Simone Paolo Ponzetto and Liviu P. Dinu},
title = {Exploring Neural Text Simplification Models},
booktitle = {{ACL} {(2)}},
publisher = {The Association for Computational Linguistics},
year = {2017}
}
luarocks install tds
git clone --recursive https://github.com/senisioi/NeuralTextSimplification.git
python src/download_models.py ./models
cd src/scripts
./translate.sh
cd ../../results_NTS
pip install -r src/requirements.txt
python src/evaluate.py ./data/test.en ./data/references/references.tsv ./predictions/
Contains the OpenNMT config file. To train, please update the config file with the appropriate data on your local system and run
th train -config $PATH_TO_THIS_DIR/configs/NTS.cfg
Contains predictions from previous systems (Wubben et al., 2012), (Glavas and Stajner, 2015), and (Xu et al., 2016), and the generated predictions of the NTS models reported in the paper:
NTS_default_b5_h1 - the default model, beam size 5, hypothesis 1
NTS_BLEU_b12_h1 - the BLEU best ranked model, beam size 12, hypothesis 1
NTS_SARI_b5_h2 - the SARI best ranked model, beam size 12, hypothesis 1
NTS-w2v_default_b5_h1 - the default model, beam size 5, hypothesis 1
NTS-w2v_BLEU_b12_h1 - the BLEU best ranked model, beam size 12, hypothesis 1
NTS-w2v_SARI_b12_h2 - the SARI best ranked model, beam size 12, hypothesis 2
Contains the training, testing, and reference sentences used to train and evaluate our models.