Source code for our "TitleStylist" paper at ACL 2020: Jin, Di, Zhijing Jin, Joey Tianyi Zhou, Lisa Orii, and Peter Szolovits. "Hooks in the Headline: Learning to Generate Headlines with Controlled Styles." ACL (2020).. If you use the code, please cite the paper:
@inproceedings{jin2020hooks,
author = {Di Jin and Zhijing Jin and Joey Tianyi Zhou and Lisa Orii and Peter Szolovits},
title = {Hooks in the Headline: Learning to Generate Headlines with Controlled
Styles},
booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2020, Online, July 5-10, 2020}, pages = {5082--5093},
publisher = {Association for Computational Linguistics}, year = {2020},
url = {https://www.aclweb.org/anthology/2020.acl-main.456/}
}
Here is a talk that introduces our work.
In order to install them, you can run this command:
pip install -r requirements.txt
In order to evaluate the generated headlines by ROUGE scores, you need to install the "files2rouge" package. To do so, run the following commands (provided by this repository):
pip install -U git+https://github.com/pltrdy/pyrouge
git clone https://github.com/pltrdy/files2rouge.git
cd files2rouge
python setup_rouge.py
python setup.py install
All data including the combination of CNN and NYT article and headline pairs, and the three style-specific corpora (humor, romance, and clickbait) mentioned in the paper have been placed in the folder "data".
Please download the pretrained model parameters of MASS from this link, unzip it, and put the unzipped files into the folder "pretrained_model/MASS".
To train a headline generation model that can simultaneously generated a facutal and a stylistic headline, you can run the following command:
./train_mix_CNN_NYT_X.sh --style YOUR_TARGET_STYLE
Here the arugment YOUR_TARGET_STYLE specifies any style you would like to have, in this paper, we provide three options: humor, romance, clickbait.
After running this command, the trained model parameters will be saved into the folder "tmp/exp".
./evaluate_mix_CNN_NYT_X.sh --style YOUR_TARGET_STYLE --model_dir MODEL_STORED_DIRCTORY
In this command, the argument MODEL_STORED_DIRCTORY specifies the directory which stores the trained model.
./train_mix_CNN_NYT_multiX.sh
./evaluate_mix_CNN_NYT_multiX.sh --model_dir MODEL_STORED_DIRCTORY
For the humorous style, although we used humorous novels, you can also try the following datasets:
We suggest that the large dataset Short Jokes is likely to generate good headlines.