kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.39k stars 443 forks source link

Missing Introduction section #700

Open AaronNGray opened 3 years ago

AaronNGray commented 3 years ago

Getting a missing Introdiction section first paragraphs for some strange reason ?

http://lampwww.epfl.ch/~amin/dot/fool.pdf

Text picks up on second column text "well as its mixture of nominal ..."

qhreul commented 3 years ago

@AaronNGray I have observed the same issue on medical articles.

As part of EHF2-6-1105.pdf, I have observed that the whole "Epidemiology" section in page one (defined across 2 columns) is missing in the TEI XML after conversion. This behaviour was observed with docker version (i.e. lfoppiano/grobid:0.7.0).