ucfnlp / multidoc_summarization

Code for the EMNLP 2018 paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization"
Other
33 stars 20 forks source link

for training #2

Closed SusannaWull closed 5 years ago

loganlebanoff commented 5 years ago

Thank you for your interest in my work! The benefit of using PG-MMR is that it can use a pre-trained model trained on a single-document dataset like CNN/DM and then run on a multi-document dataset at test-time without any additional training.

Do you want to train on a different single-document dataset than CNN/DM, or do you want to train on a multi-document dataset? If you want a different single-document dataset, this will be a bit difficult to do. You can go to Abigail See's github (https://github.com/abisee/pointer-generator) on how to train the original PG, but you will have to do a great deal of pre-processing to get it to work with a new dataset. If you want to train on a multi-document dataset, PG-MMR is not meant to be trained on multi-document datasets, which is the beauty of the method.

If you want to test on a new multi-document dataset, there instructions on how to do so on the README.

SusannaWull commented 5 years ago

Thanks! I have tested on other English news using your method, but I got the terrible results that are all ', ' or '.'. Can you give some explanation?

loganlebanoff commented 5 years ago

I have a just made a fix to the README, which is that you must include the --pg_mmr flag when doing testing. Can you try again with that flag?

If that doesn't work, can you try running on the example_custom_dataset and let me know what the output is for that?

yashkumaratri commented 5 years ago

I tried running with the example_custom_dataset and the inference is stuck at

image

Can you tell me what's going on ?

I just ran it with default settings as provided in readme. I left it running for 12 hours and still no output.

SusannaWull commented 5 years ago

Did you run the "convert_data.py" file? I think it would work if you can modify the code to fit the data format or prepare the suitable dataset. Another suggestion for you is to read the other paper of "Point-genertor network", https://github.com/abisee/pointer-generator.

------------------ 原始邮件 ------------------ 发件人: "Yash Kumar Atri"<notifications@github.com>; 发送时间: 2019年11月12日(星期二) 下午4:05 收件人: "ucfnlp/multidoc_summarization"<multidoc_summarization@noreply.github.com>; 抄送: "Luky Wu"<747126366@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [ucfnlp/multidoc_summarization] for training (#2)

I tried running with the example_custom_dataset and the inference is stuck at

Can you tell me what's going on ?

I just ran it with default settings as provided in readme.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

yashkumaratri commented 5 years ago

It's working fine, Just had some issues in the environment.

Eoin-McMahon commented 4 years ago

@yashkumaratri How did you fix the error? I am having the same problem

khoaiha12 commented 4 years ago

@yashkumaratri i have same problem.