maszhongming / MatchSum

Code for ACL 2020 paper: "Extractive Summarization as Text Matching"
519 stars 108 forks source link

What is the meaning of "label" in the CNN/DM dataset? #17

Closed Andrei997 closed 4 years ago

Andrei997 commented 4 years ago

Hello, and first of all, thank you for sharing this repository. I was wondering what is the meaning of the "label" field in the dataset you have made available? To me, it represents which sentences that need to be part of the final summary. However, the number of sentences within a given article is often bigger than the size of the "label" list, so I was wondering what is the reason for that and how to get the actual label sentences.

Thanks a lot :)

Ricardokevins commented 4 years ago

I have the same problem. I didn`t know what 'labels' mean (If you know plz tell me) And meanwhile Do you try to run the code on the data he provided ? I use Lead3 on his Data And got much higher Rouge1 . do you have the same problem

Ricardokevins commented 4 years ago

Hello, and first of all, thank you for sharing this repository. I was wondering what is the meaning of the "label" field in the dataset you have made available? To me, it represents which sentences that need to be part of the final summary. However, the number of sentences within a given article is often bigger than the size of the "label" list, so I was wondering what is the reason for that and how to get the actual label sentences.

Thanks a lot :)

I have the same problem. I didn`t know what 'labels' mean (If you know plz tell me) And meanwhile Do you try to run the code on the data he provided ? I use Lead3 on his Data And got much higher Rouge1 . do you have the same problem

Andrei997 commented 4 years ago

Hello, and first of all, thank you for sharing this repository. I was wondering what is the meaning of the "label" field in the dataset you have made available? To me, it represents which sentences that need to be part of the final summary. However, the number of sentences within a given article is often bigger than the size of the "label" list, so I was wondering what is the reason for that and how to get the actual label sentences. Thanks a lot :)

I have the same problem. I didn`t know what 'labels' mean (If you know plz tell me) And meanwhile Do you try to run the code on the data he provided ? I use Lead3 on his Data And got much higher Rouge1 . do you have the same problem

Sorry no, I am just using the provided dataset for a different purpose :(

maszhongming commented 4 years ago

Hello, and first of all, thank you for sharing this repository. I was wondering what is the meaning of the "label" field in the dataset you have made available? To me, it represents which sentences that need to be part of the final summary. However, the number of sentences within a given article is often bigger than the size of the "label" list, so I was wondering what is the reason for that and how to get the actual label sentences. Thanks a lot :)

I have the same problem. I didn`t know what 'labels' mean (If you know plz tell me) And meanwhile Do you try to run the code on the data he provided ? I use Lead3 on his Data And got much higher Rouge1 . do you have the same problem

Hi, label is the greedy extractive oracle we use to train our BertExt. Specifically, we use a greedy algorithm to obtain the labels and the details can be found in this paper: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents, Page 3, Extractive Training section. I'm not sure why you got a higher R1 in LEAD3, did you use pyrouge 0.1.3 to evaluate ROUGE? Is it ROUGE-1 F value?

Ricardokevins commented 4 years ago

Hello, and first of all, thank you for sharing this repository. I was wondering what is the meaning of the "label" field in the dataset you have made available? To me, it represents which sentences that need to be part of the final summary. However, the number of sentences within a given article is often bigger than the size of the "label" list, so I was wondering what is the reason for that and how to get the actual label sentences. Thanks a lot :)

I have the same problem. I didn`t know what 'labels' mean (If you know plz tell me) And meanwhile Do you try to run the code on the data he provided ? I use Lead3 on his Data And got much higher Rouge1 . do you have the same problem

Hi, label is the greedy extractive oracle we use to train our BertExt. Specifically, we use a greedy algorithm to obtain the labels and the details can be found in this paper: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents, Page 3, Extractive Training section. I'm not sure why you got a higher R1 in LEAD3, did you use pyrouge 0.1.3 to evaluate ROUGE? Is it ROUGE-1 F value?

Thanks a lot for answering my question. (I am just a new NLPer with poor english) I test Lead3 Algorithm on test_CNNDM_bert.jsonl you release just simply extract head 3 sentences in each line ['text'] of json file At the same time I use {gold_summary=" ".join(dic["summary"])} to generate golden summary And use rouge(in python) to evaluate rouge1 And found almost Rouge1-recall:0.49 I use pyrouge to test again and found about Rouge1 Recall 0.52365 F 0.4095 I would like to know do you use rouge to score you summary in json And use pyrouge to get result published in Match_Sum? you just use F Value in All released Score ? Last question ,Oracle mentioned in Match_Sum means best summary Extractive Algorithm can get? It`s a global best or just in candidate set. Thanks A lot for your Answering

谢谢你的回复,我是一个NLP方向的新手,英语能力也相对有限 我在您发布的test_CNNDM_bert.jsonl文件上测试lead3算法查看是否可以达到match_sum论文中发布的lead3分数 我简单的抽取每一行的字典里的'text‘作为原文,然后选前三句 同时我用{gold_summary=" ".join(dic["summary"])} 产生标注摘要 我使用了rouge测试,发现Rouge1-recall:0.49 使用pyrouge测试结果为Rouge1 Recall 0.52365 F 0.4095 请问您是在句子打分(json里的score使用了rouge获取F1平均值),然后在论文发布的结果里使用pyrouge评测吗 (我还尝试实现oracle,但是也发现了分数上的相对不一致,所以产生了一些疑惑,是否在数据处理上的不同) 最后的发布分数都是采用了rouge的F值嘛 最后再问一下,论文里的oracle是指摘要可能解中的最优解(摘要的上限分数吗,是一个全局最优,还是在候选5句上的最优) 感谢你的回复

Andrei997 commented 4 years ago

Hello, and first of all, thank you for sharing this repository. I was wondering what is the meaning of the "label" field in the dataset you have made available? To me, it represents which sentences that need to be part of the final summary. However, the number of sentences within a given article is often bigger than the size of the "label" list, so I was wondering what is the reason for that and how to get the actual label sentences. Thanks a lot :)

I have the same problem. I didn`t know what 'labels' mean (If you know plz tell me) And meanwhile Do you try to run the code on the data he provided ? I use Lead3 on his Data And got much higher Rouge1 . do you have the same problem

Hi, label is the greedy extractive oracle we use to train our BertExt. Specifically, we use a greedy algorithm to obtain the labels and the details can be found in this paper: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents, Page 3, Extractive Training section. I'm not sure why you got a higher R1 in LEAD3, did you use pyrouge 0.1.3 to evaluate ROUGE? Is it ROUGE-1 F value?

This makes sense, however, the label field itself has different size than the number of sentences within a respective article. For example, say an article has 30 sentences, the label field might only have 19 entries. So then, how do you know which sentences are to be part of the summary? Is it correct to assume the sentences beyond 19 are automatically not part of the summary?

maszhongming commented 4 years ago

Hi, @Ricardokevins, because there may be other people who have similar questions with you, I will answer your questions in English here.

rouge (in python) is only an approximate implementation of ROUGE score, all the results in our paper are evaluated using standard ROUGE (ie pyrouge 0.1.3), and all are F values. Please note that there is a problem with the way you treat both text and summary as a string without separators. There should be a separator between each sentence, it can be \n or <q> etc., otherwise, it will cause the ROUGE-L score to be incorrect.

ORACLE is the extractive ground truth we used to train our BertExt, it is obtained by a greedy algorithm I mentioned above and it has nothing to do with candidate set. Match Oracle is the oracle result in our candidate set.

maszhongming commented 4 years ago

Hi @Andrei997, because BERT has a limit of 512, we truncated all text to 512 tokens. Therefore, label here is the label of the truncated text. For instance, in your example, the text of 30 sentences may be truncated to 19 sentences, so label has only 19 entries.

Andrei997 commented 4 years ago

Hi @Andrei997, because BERT has a limit of 512, we truncated all text to 512 tokens. Therefore, label here is the label of the truncated text. For instance, in your example, the text of 30 sentences may be truncated to 19 sentences, so label has only 19 entries.

Okay that answers my question :) Thanks a lot!

I will close this now.

Ricardokevins commented 4 years ago

Hi, @Ricardokevins, because there may be other people who have similar questions with you, I will answer your questions in English here.

rouge (in python) is only an approximate implementation of ROUGE score, all the results in our paper are evaluated using standard ROUGE (ie pyrouge 0.1.3), and all are F values. Please note that there is a problem with the way you treat both text and summary as a string without separators. There should be a separator between each sentence, it can be \n or <q> etc., otherwise, it will cause the ROUGE-L score to be incorrect.

ORACLE is the extractive ground truth we used to train our BertExt, it is obtained by a greedy algorithm I mentioned above and it has nothing to do with candidate set. Match Oracle is the oracle result in our candidate set.

Thanks A lot for you patient reply! Thank you so much