Open dksanyal opened 6 years ago
Hi,
I think you can get it from LayoutToken
or Page
according to your needs.
This is correct @yaojl2006 thanks.
Pages are not present in the TEI on purpose, because the TEI aims at capturing the logical structure of a document. The pagination is only one possible presentation of a document. It is actually impossible to represent in a single XML document (under a single hierarchy) at the same time the logical structure of a document and a presentation rendering.
As @yaojl2006 mentioned, however, each token in GROBID is synchronized with the source PDF document, and you can access its original pagination information (see the output coordinates that can be outputted in the resulting TEI as attributes for some fields - not all fields are supported yet in the TEI).
Coordinates include page number, see https://github.com/kermitt2/grobid/issues/397
Just to make it clear:)
Hi, We are extracting table of contents from a paper by reading the text between
and . But it does not give any page number information. I would be thankful if you could suggest how we can extract page number (absolute if possible, relative if page numbers are absent in PDF). Any pointers regarding which source files to look at would be great! Thanks in advance!