Open Kiwifed0r opened 4 years ago
Hello @Kiwifed0r
Grobid supports footnotes, it will serialize them after the figures and before the bibliography, e.g.:
...
</figure>
<note place="foot" n="2">This condition means that we are correctly identifying
the coordinate location of the horizon to first order. <ref type="bibr" target="#b2">3</ref>
We use units where G = c =h = 1. <ref type="bibr" target="#b3">4</ref> For coupled
Einstein-scalar field theory, there would be a contribution to the flux F from the matter
fields. <ref type="bibr" target="#b4">5</ref> Our convention for the Fourier transform is
F(k) = e iku F(u)du.</note>
</body>
<back>
...
Normally the content of foot notes never disappears - either it is extracted correctly as such, or it doesn't work and it appears then usually as "normal" text or (worst) as figure caption. If you see some footnote content disappearing, please fill an issue with a test case so that we can reproduce the problem.
However, numbered footnotes are detected I think in around 50% of the cases - depending a lot on the documents (it can be perfect or all missed), so it's not a structure that we can consider as reliable currently. The reason is that there is very few training data for this right now. If you feel inspired by helping with training data, it's the segmentation
model that covers this structure.
Thank you for the quick reply! I will look into creating training data and also do some more tests in regards to footnotes disappearing.
Hi!
I'm trying to extract footnotes from pdfs and I'm running into issues. The resulting TEI looks fine in regards to the abstract, sections, references, etc. But the footnotes don't work at all. The footnote anchor ends up as normal text. And the footnote text either disappears completely or also ends up as normal text.
Is Grobid just not trained for footnote detection and I have to train my own model or is there anything else I could try?