kedz / nnsum

An extractive neural network text summarization library for the EMNLP 2018 paper "Content Selection in Deep Learning Models of Summarization" (https://arxiv.org/abs/1810.12343).
108 stars 19 forks source link

About result on CNN/DM(non-anonymized)result using SummaRuRNN? #6

Closed xcfcode closed 5 years ago

xcfcode commented 5 years ago

Could you please share the ROUGE-1, ROUGE-2 and ROUGE-L score on non-anonymized CNNDM using SummaRuRNN?

kedz commented 5 years ago

Hi Xiachong,

Unfortunately I won't be able to do this in the near term future as I never trained models on the anonymized dataset and don't currently have access to free gpus for this purpose. But you could run it yourself, you would just have to prepare the anonymized version of the dataset and use the existing code to train/eval. If you install the reddit data (it is a small dataset) you will see how the data must be formatted.

Cheers, Chris

On Tue, Mar 26, 2019 at 1:50 AM Xiachong Feng notifications@github.com wrote:

Could you please share the ROUGE-1, ROUGE-2 and ROUGE-L score on non-anonymized CNNDM using SummaRuRNN?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kedz/nnsum/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGXMwVBr2GMZ3bVN2VqRRGit8knGbclks5vabUQgaJpZM4cKpuK .

-- Chris Kedzie PhD Student, Dept. of Computer Science Columbia University email: kedzie@cs.columbia.edu web: www.cs.columbia.edu/~kedzie

xcfcode commented 5 years ago

Thank you for your reply, you have released an excellent model and data preprocess code, I am working on it based on your job, but in the paper, ROUGE-2 recall used as the main metric, maybe it is convenient to share the ROUGE-1, ROUGE-2 and ROUGE-L F-1 score on your dataset?

kedz commented 5 years ago

Thanks! This I can do! I'll get the F1 score for you by Friday.

On Wed, Mar 27, 2019 at 7:43 PM Xiachong Feng notifications@github.com wrote:

Thank you for your reply, you have released an excellent model and data preprocess code, I am working on it based on your job, but in the paper, ROUGE-2 recall used as the main metric, maybe it is convenient to share the ROUGE-1, ROUGE-2 and ROUGE-L F-1 score on your dataset?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kedz/nnsum/issues/6#issuecomment-477388261, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGXM6LC4LebCbFrPbhqj-xdQ_5AMPYkks5vbAIJgaJpZM4cKpuK .

-- Chris Kedzie PhD Student, Dept. of Computer Science Columbia University email: kedzie@cs.columbia.edu web: www.cs.columbia.edu/~kedzie

xcfcode commented 5 years ago

Sincerely thanks!

kedz commented 5 years ago

Hi Xiachong,

I'm running the eval script now but for some reason it is going very slow. The rouge script writes lots of little files and my azure instance's is not happy about it for some reason. Just letting you know that I didn't forget about getting you the results but it will probably not happen until the end of the weekend.

Cheers, Chris

On Thu, Mar 28, 2019 at 1:54 AM Xiachong Feng notifications@github.com wrote:

Sincerely thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kedz/nnsum/issues/6#issuecomment-477459038, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGXMy7udF8LBVn_wnmjpTUYqbormiWJks5vbFkYgaJpZM4cKpuK .

-- Chris Kedzie PhD Student, Dept. of Computer Science Columbia University email: kedzie@cs.columbia.edu web: www.cs.columbia.edu/~kedzie

xcfcode commented 5 years ago

Thank you!!!

kedz commented 5 years ago

Here are the results for fscore. I also included the validation fscores since that might be helpful for you as well.

valid fscore
                                    rouge-1   rouge-2   rouge-L
encoder=avg.extractor=summarunner  0.397494  0.183064  0.374984
encoder=cnn.extractor=summarunner  0.401990  0.184302  0.378438
encoder=rnn.extractor=summarunner  0.396806  0.182432  0.373832
encoder=avg.extractor=avg          0.398780  0.183542  0.376188
encoder=cnn.extractor=avg          0.394794  0.181842  0.371880
encoder=rnn.extractor=avg          0.397194  0.182972  0.374454
encoder=avg.extractor=s2s          0.398872  0.183708  0.376264
encoder=cnn.extractor=s2s          0.395500  0.182008  0.372392
encoder=rnn.extractor=s2s          0.397400  0.183130  0.374448
encoder=avg.extractor=c&l          0.396136  0.182110  0.373704
encoder=cnn.extractor=c&l          0.406722  0.189006  0.382866
encoder=rnn.extractor=c&l          0.399654  0.183158  0.376912

test fscore
                                    rouge-1   rouge-2   rouge-L
encoder=avg.extractor=summarunner  0.390720  0.177668  0.367822
encoder=cnn.extractor=summarunner  0.389396  0.175366  0.365940
encoder=rnn.extractor=summarunner  0.389102  0.176370  0.365638
encoder=avg.extractor=avg          0.391940  0.177990  0.368896
encoder=cnn.extractor=avg          0.387062  0.175614  0.363714
encoder=rnn.extractor=avg          0.389958  0.177236  0.366708
encoder=avg.extractor=s2s          0.392616  0.178640  0.369590
encoder=cnn.extractor=s2s          0.387646  0.175702  0.364260
encoder=rnn.extractor=s2s          0.390314  0.177294  0.366958
encoder=avg.extractor=c&l          0.389154  0.176684  0.366272
encoder=cnn.extractor=c&l          0.394502  0.173300  0.370758
encoder=rnn.extractor=c&l          0.392294  0.177158  0.369192

These are the averaged results over 5 different random seeds (as was done in the paper)

xcfcode commented 5 years ago

So detailed!!!It helps a lot!!!

xcfcode commented 5 years ago

Could you please share your hyper-params for encoder=rnn.extractor=summarunner?

xcfcode commented 5 years ago

These are all my params

{'train_inputs': PosixPath(''), 'train_labels': PosixPath('/'), 'valid_inputs': PosixPath(''), 'valid_labels': PosixPath('), 'valid_refs': PosixPath(''), 'seed': 12345678, 'epochs': 50, 'batch_size': 32, 'gpu': 0, 'teacher_forcing': 25, 'sentence_limit': 50, 'weighted': True, 'loader_workers': 8, 'raml_samples': 25, 'raml_temp': 0.05, 'summary_length': 100, 'remove_stopwords': True, 'shuffle_sents': False, 'model': PosixPath('checkpoints/rnn-sr'), 'results': PosixPath('results/rnn-sr.txt'), 'trainedmodel': None} {'embedding_size': 200, 'pretrained_embeddings': './glove.6B.200d.txt', 'top_k': None, 'at_least': 1, 'word_dropout': 0.0, 'embedding_dropout': 0.25, 'update_rule': 'fix_all', 'filter_pretrained': False} {'hidden_size': 300, 'bidirectional': True, 'dropout': 0.25, 'num_layers': 1, 'cell': 'gru', 'OPT': 'rnn'} {'hidden_size': 300, 'rnn_dropout': 0.25, 'num_layers': 1, 'cell': 'gru', 'sentence_size': 100, 'document_size': 100, 'segments': 4, 'max_position_weights': 50, 'segment_size': 16, 'position_size': 16, 'OPT': 'sr'}

And I choose model by ROUGE-2 Recall, However I can only get about 0.37+ ROUGE-1 fscore on valid set. Maybe I have done somthing wrong?

kedz commented 5 years ago

These look like the default parameters used in the paper. The results I sent are averaged over 5 random seeds. I would try to averaging the results of a few random seeds.

On Wed, Apr 10, 2019 at 3:40 AM Xiachong Feng notifications@github.com wrote:

These are all my params

{'train_inputs': PosixPath(''), 'train_labels': PosixPath('/'), 'valid_inputs': PosixPath(''), 'valid_labels': PosixPath('), 'valid_refs': PosixPath(''), 'seed': 12345678, 'epochs': 50, 'batch_size': 32, 'gpu': 0, 'teacher_forcing': 25, 'sentence_limit': 50, 'weighted': True, 'loader_workers': 8, 'raml_samples': 25, 'raml_temp': 0.05, 'summary_length': 100, 'remove_stopwords': True, 'shuffle_sents': False, 'model': PosixPath('checkpoints/rnn-sr'), 'results': PosixPath('results/rnn-sr.txt'), 'trainedmodel': None} {'embedding_size': 200, 'pretrained_embeddings': './glove.6B.200d.txt', 'top_k': None, 'at_least': 1, 'word_dropout': 0.0, 'embedding_dropout': 0.25, 'update_rule': 'fix_all', 'filter_pretrained': False} {'hidden_size': 300, 'bidirectional': True, 'dropout': 0.25, 'num_layers': 1, 'cell': 'gru', 'OPT': 'rnn'} {'hidden_size': 300, 'rnn_dropout': 0.25, 'num_layers': 1, 'cell': 'gru', 'sentence_size': 100, 'document_size': 100, 'segments': 4, 'max_position_weights': 50, 'segment_size': 16, 'position_size': 16, 'OPT': 'sr'}

And I choose model by ROUGE-2 Recall, However I can only get about 0.37+ ROUGE-1 fscore on valid set. Maybe I have done somthing wrong?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kedz/nnsum/issues/6#issuecomment-481574430, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGXM4Lor1PRoaPl8fPUmj326ff6-8faks5vfZWCgaJpZM4cKpuK .

-- Chris Kedzie PhD Student, Dept. of Computer Science Columbia University email: kedzie@cs.columbia.edu web: www.cs.columbia.edu/~kedzie

xcfcode commented 5 years ago

Thanks a lot!