Closed JaniceXiong closed 3 years ago
Hi JaniceXiong,
We use the 1st section if the Intro
section is missing, and use the last section if the Conlcu
section is missing.
I have uploaded the preprocess code to /notebooks/preprocess_jsonl2pairs.ipynb
, the notebook generates source
target
pairs from train|dev|test.jsonl
. I hope that will help.
In the paper《Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents》, I find that models take various content as input, like Full, IC, Intro, Method, Result, Conclu ...
And as written in Appendix A, we can use keywords to identify paper sections. But After I preprocess the crawled data, I find that not all the data has complete sections. For example, this item in test set doesn't have Introduction and Conclusion section , so IC is also empty.
So I wonder whether the experiment only done on the data with this field, like 5000/6000 in test set has Intro section, so only calculate the result of these 5000 data.
And could you release some preprocess code to ensure that the data used in model is the same?