salesforce / decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP
BSD 3-Clause "New" or "Revised" License
2.34k stars 474 forks source link

What is the format for the input of text summarization? #32

Closed code-cse closed 5 years ago

code-cse commented 5 years ago

I am running the pretrained model of decaNLP and in the mentioned inference file you are passing three things namely "Context, Question and real Answer". What should be the input in the inference file if I want to use the text summarization task of cnn dailymail?

bmccann commented 5 years ago

If you want to summarize your own dataset: the Context should be the document that you want to summarize. The question should be “What is the summary?”. And the answer should be the expected summary. If you don’t have an answer in mind, then you can just use a dummy sequence for the answer for now. This will both print out all the predictions and write them to a file as described in the readme.

If you want to run the model on the validation or test splits of cnn/dailymail though, you don’t need to worry about all that. Just look at the example in the readme for running inference with predict.py and add the —tasks cnn_dailymail argument.

code-cse commented 5 years ago

Thank you a lot for the quick response.