Open mariadelmarq opened 3 months ago
Hi @mariadelmarq, thanks for reporting this problem.
Could you please send me the PDF of this issue and on #1143 at luca AT sciencialab.com?
I'm not able to access them via the pubmed / publisher portal 😅
Sent, thanks heaps for looking into it!
Thanks for sending the files, I'm sorry, I did not have time to check them till now.
For the file discussed in this issue, there are two issues:
Lee M. Ritterband
tagged as <other>
is somehow lost). For this I'm not sure it's a bug, because the funding information is correctly covered. As far as I understood, the conflict of interests should not be part of the funding statement as in the grobid approach, or at least for this version of the funding-acknowledgment extraction. I leave this to @kermitt2, for confirmation.Hello !
Indeed Conflict Of Interest section is not part of the funding section and is considered as a section on its own. However it's not identified explicitly as such by Grobid yet. This is something to do in the future, so extend the segmentation and header models to explicitly recognize COI sections, which is not something complicated I think. I already received this request, COI is more and more common.
About the text lost in the header, what is labeled with other
is normally "noise" that we don't want to add to the output (even under a note element). In this example case, it is not working unfortunately, but if we extend the model(s) to cover COI, we can expect a good fix.
Thank you both so much for looking into this. For the other articles I'm looking at, Conflict of Interest statements tend to end up in the back matter tag, either one or two divs down, or sometimes within a note tag. Sometimes they do end up in the body, though, which is ok for me, as long as they're somewhere.
Hi,
We are looking into using Grobid for a project to look into conflict of interest, funding, and other transparency statements in published articles. These statements are put in different random locations depending on the publisher, sometimes in footnotes, sometimes after that abstract, sometime in the back matter, etc.
For the published pdf for this particular article (not the author manuscript, which is open access, but the actual published pdf by the APA): https://pubmed.ncbi.nlm.nih.gov/27819460/, Grobid does well to extract the funding information from paragraph 4 of the footnote on page 1, but the conflict of interest, contained in paragraph 5 of the same footnote, is missing from the xml output. I suspect perhaps Grobid does not know where to put it in the xml... Is there any chance this has an easy fix?