Closed mariadelmarq closed 2 weeks ago
@mariadelmarq thanks again for reporting this issue, feel free to send me the source via email.
@mariadelmarq which grobid version/environment/OS are you using?
Linux OS (Gnome Classic Desktop), running GROBID via Docker with: docker run --rm --init --ulimit core=0 -p 8070:8070 grobid/grobid:0.8.0
.
I installed the python client and in my script have:
from grobid_client.grobid_client import GrobidClient
client = GrobidClient(config_path="./config.json")
client.process("processFulltextDocument", fulltext_dir, output = grobid_path)
Thanks. I've checked and for this bug, is going to be fixed in the coming version 0.8.1. We can leave it open and after the release I will double check.
Brilliant, thanks heaps!!!
@mariadelmarq I want to double check on this, with version 0.8.1 the result seems correct:
<div type="funding">
<div>
<p>Funding was provided by the
<rs type="funder">Children's Trust, Massachusetts</rs>, Grant
<rs type="grantNumber">5014</rs>. We are grateful for the support of colleagues at the
<rs type="affiliation">Tufts Interdisciplinary Evaluation Research Group</rs> and for the participation of the research participants.
</p>
</div>
</div>
<listOrg type="funding">
<org type="funding" xml:id="_sNGGdEJ">
<idno type="grant-number">5014</idno>
</org>
</listOrg>
under titleStmt we have also the funder's name:
<funder ref="#_sNGGdEJ">
<orgName type="full">Children's Trust, Massachusetts</orgName>
</funder>
May we consider this as the correct output?
@lfoppiano looks perfect, thanks so much!
Hi,
Me again, sorry! I have another potential error case where the funder is correctly identified, but the name of the funder is left out of the TEI/XML output in a Cambridge University Press article. See screenshots below:
From the pdf:
Resulting tei xml
Because we're using NLP techniques to find the funding statements in the text of the article (I acknowledge we are potentially doubling up with what GROBID is attempting to do), this makes it really hard to identify the name of the funder, and the fact that there is a funding statement. Grateful for any ideas/advice!