Open wyzhhhh opened 2 weeks ago
Hi @wyzhhhh,
The full S2ORC dataset releases include an "annotations"
field along with the paper data. This field contains information about the indices corresponding to various parts (eg. title, abstract, author names, individual paragraphs, etc.) of the paper's plaintext.
Here's an illustration of the S2ORC schema:
We used the indices listed under the "bibref"
annotations to isolate the positions of inline citations. These annotations also usually included a "matched_paper_id"
field that we could use to match an inline citation from a source paper to a cited target paper within the S2ORC dataset.
I hope this answers your question. Let us know if you have any more!
How does the author get the inline information from the S2ORC dataset?