grobid skip pages - Githubissues

kermitt2 / grobid

A machine learning software for extracting information from scholarly documents

Apache License 2.0

3.61k stars 461 forks source link

Hi @peilin54 !

Sorry for the slow answer. It's not something common for sure, it might be that the full content of the page is not classified as reference section, but as annex, and thus the corresponding reference entries are overlooked. The misclassification might be related to the watermark (that might be confusing with a figure element). One solution is to add a few examples of this sort of articles in the training data of the segmentation model. If you can share the document with me and if it is CC-BY, I can have a look!

kermitt2 / grobid

grobid skip pages #1053