🌟 New model addition

Model description

Current summarization systems yield generic summaries that are disconnected from users’ preferences and expectations. To address this limitation, we present CTRLsum, a novel framework for controllable summarization.

Our approach enables users to control multiple aspects of generated summaries by interacting with the summarization system through textual input in the form of a set of keywords or descriptive prompts. Using a single unified model, CTRLsum is able to achieve a broad scope of summary manipulation at inference time without requiring additional human annotations or pre-defining a set of control aspects during training. We quantitatively demonstrate the effectiveness of our approach on three domains of summarization datasets and five control aspects: 1) entity-centric 2) length-controllable summarization 3) contribution summarization on scientific papers 4) invention purpose summarization on patent filings 5) question-guided summarization on news articles in a reading comprehension setting

Moreover, when used in a standard, uncontrolled summarization setting, CTRLsum achieves state-of-the-art results on the CNN/DailyMail dataset.

Open source status

[x] the model implementation is available: https://github.com/salesforce/ctrl-sum
[x] the model weights are available: Download link available in the README of the repo
[x] who are the authors: @jxhe @muggin

I ported this model for easy use in Hugging Face Transformers. Try using the code below!

1. Create models and tokenizers

>> from transformers import AutoModelForSeq2SeqLM, PreTrainedTokenizerFast

>>> model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-cnndm")
>>> # model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-arxiv")
>>> # model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-bigpatent")

>>> tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-cnndm")
>>> # tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-arxiv")
>>> # tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-bigpatent")

2. Unconditioned summarization

>>> data = tokenizer("My name is Kevin. I love dogs. I loved dogs from 1996. Today, I'm going to walk on street with my dogs", return_tensors="pt")
>>> input_ids, attention_mask = data["input_ids"], data["attention_mask"]
>>> tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5))[0]
'</s>My name is Kevin. I loved dogs from 1996.</s>'

3. Conditioned summarization

You can input condition token using TOKEN => CONTENTS structure

>>> data = tokenizer("today plan => My name is Kevin. I love dogs. I loved dogs from 1996. Today, I'm going to walk on street with my dogs", return_tensors="pt")
>>> input_ids, attention_mask = data["input_ids"], data["attention_mask"]
>>> tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5))[0]
"</s> Today, I'm going to walk on street with my dogs. I loved dogs from 1996</s>"

4. Prompt summarization

You can also input decoder_input_ids for input prompt.

>>> data = tokenizer("Q:What is my name? A: => My name is Kevin. I love dogs. I loved dogs from 1996. Today, I'm going to walk on street with my dogs", return_tensors="pt")
>>> input_ids, attention_mask = data["input_ids"], data["attention_mask"]
>>> tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5, decoder_input_ids=tokenizer("Q:What is My name? A:", return_tensors="pt")["input_ids"][:, :-1]))[0]
'<s>Q:What is My name? A: Kevin.</s>'

huggingface / transformers

🌟 CTRLsum #9001