Question about content plan

ratishsp / data2text-plan-py

Code for AAAI 2019 paper on Data-to-Text Generation with Content Selection and Planning

163 stars 46 forks source link

Question about content plan #18

Closed BlackFeetMouse closed 4 years ago

BlackFeetMouse commented 4 years ago

Hello, thank you so much for sharing such a nice project.

I have read the paper and I want to ask one question about content plan.

Could you tell me whether the data used for training the model included content plan? Or it only used original ROTOWIRE (Wiseman et al. 2017) game summary and box-score dataset?

Thank you so much.

ratishsp commented 4 years ago

Thanks for your feedback! For training the model, we require content plan, box-score table and game summary.

BlackFeetMouse commented 4 years ago

Thanks for your reply!

The paper mentioned the content plan was extracted by information extraction approach (Wiseman et al. 2017) and the type of each data record was predicted. I am wondering whether it will cause content plan contains erroneous data too.

Could you tell me whether the content plan used for generating the text was extracted by IE approach too and whether that content plan might contain erroneous data items?

Thank you so much for your time and consideration.

ratishsp commented 4 years ago

In an ideal scenario, we should have a gold content plan annotated by humans. As this is not feasible, we make use of (silver) content plan. Yes, this content plan is obtained by running the IE approach on the gold training summaries. Indeed, it contains erroneous data items too sometimes. We had made a study of the accuracy of IE on a held-out set in our paper. From our paper: "On held-out data it achieved 94% accuracy, and recalled approximately 80% of the relations licensed by the records." During inference, we predict the content plan and generate the summary based on the predicted content plan.

BlackFeetMouse commented 4 years ago

I greatly appreciate your assistance with my question!