kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.59k stars 459 forks source link

Supplementary materials #1199

Open lfoppiano opened 2 weeks ago

lfoppiano commented 2 weeks ago

I'm wondering how to deal with the supplementary material section.

image

Those are Figure captions of externally accessible figures, however if they are classified as figures, they end up as figures in the body. For this document I'm wondering whether we should find a way to distinguish figures coming from the annex/supplementary information and figures coming from the body.

What do you think @kermitt2?

Attaching the PDF (CC-BY) 6_10.1371_journal.pbio.3002069.pdf