salesforce / ctrl-sum

Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper
https://arxiv.org/abs/2012.04281
BSD 3-Clause "New" or "Revised" License
146 stars 24 forks source link

Use only selected sentences in source document #9

Closed lifelongeek closed 3 years ago

lifelongeek commented 3 years ago

Thanks for sharing interesting works & source code.

In section 2.2, greedily selected sentences from a document highly correlated with reference summary. While other sentences are expected to have a low correlation with reference summary. Selected sentences exist for both training & inference.

I wonder what is the expected pros/cons when using 'keywords + selected sentences' as input of the BART encoder instead of 'keywords + all sentences'. Do you have any ablation study results on this?

jxhe commented 3 years ago

Hi,

This is a good point. To clarify, the input of the BART encoder is 'keywords + source article', where the keywords are from selected sentences -- the selected sentences are not used as direct input to the encoder.

Unfortunately, we don't have ablation results on not using selected sentences. I guess that removing the step of selecting sentences is probably fine at training time, while at inference time directly tag keywords from a long document may be too noisy.