hpzhao / SummaRuNNer

The PyTorch Implementation of SummaRuNNer
https://arxiv.org/pdf/1611.04230.pdf
MIT License
253 stars 81 forks source link

sentences labeling #28

Closed BayronP closed 6 years ago

BayronP commented 6 years ago

Hello, @hpzhao After reading the paper, I am still confused about the sentence labeling method. Can you explain the detail or show the relevant code?

hpzhao commented 6 years ago

The key point of this sentence labeling method is that we make decision sentence by sentence. The power of this approach is that it could take previous decision or state in consideration. For example, we could reduce redundancy due to the novelty of the classification layer. You could read my old blog for some details. I hope this will help. @BayronP

BayronP commented 6 years ago

Maybe you misunderstood what I confused. In this paper, I find some description(below) about this labeling, but I cannot understand how to calculate this related sentences subset. Can you help me?

To solve this problem, we use an unsupervised approach to convert the abstractive summaries to extractive labels. Our approach is based on the idea that the selected sentences from the document should be the ones that maximize the Rouge score with respect to gold summaries.

hpzhao commented 6 years ago

You mean how to get the sentence label of this corpus by unsupervised approach ? Due to the lack of extractive summarization corpus, it's just a approach to build it. The approach is described by the rest of this paragraph:

Since it is computationally expensive to find a globally optimal subset of sentences that maximizes the Rouge score, we employ a greedy approach, where we add one sentence at a time incrementally to the summary, such that the Rouge score of the current set of selected sentences is maximized with respect to the entire gold summary . We stop when none of the remaining candidate sentences improves the Rouge score upon addition to the current summary set. We return this subset of sentences as the extractive ground-truth, which is used to train our RNN based sequence classifier.

@BayronP