Open kermitt2 opened 2 years ago
With version 0.8.1 this does not happens, however with CRF models the DOI has prepended the string e0263302.
Header only
With CRF only, we've got:
</monogr>
<idno type="MD5">C1A860C14E064D1C9E586BCFC5463C92</idno>
<idno type="DOI">e0263302.10.1371/journal.pone.0263302</idno>
<note type="submission">Received: July 27, 2021 Accepted: January 16, 2022</note>
</biblStruct>
With DL:
<monogr>
<imprint>
<date type="published" when="2022-01-28">January 28, 2022</date>
</imprint>
</monogr>
<idno type="MD5">C1A860C14E064D1C9E586BCFC5463C92</idno>
<idno type="DOI">10.1371/journal.pone.0263302</idno>
<note type="submission">Received: July 27, 2021 Accepted: January 16, 2022</note>
</biblStruct>
Fulltext:
with CRF:
<imprint>
<date type="published" when="2022-01-28">January 28, 2022</date>
</imprint>
</monogr>
<idno type="MD5">C1A860C14E064D1C9E586BCFC5463C92</idno>
<idno type="DOI">e0263302.10.1371/journal.pone.0263302</idno>
<note type="submission">Received: July 27, 2021 Accepted: January 16, 2022</note>
</biblStruct>
With DL:
<imprint>
<date type="published" when="2022-01-28">January 28, 2022</date>
</imprint>
</monogr>
<idno type="MD5">C1A860C14E064D1C9E586BCFC5463C92</idno>
<idno type="DOI">10.1371/journal.pone.0263302</idno>
<note type="submission">Received: July 27, 2021 Accepted: January 16, 2022</note>
</biblStruct>
The reason is that the CRF header model consider the prefix 03.... as part of the publication number:
PLoS plos P PL PLo PLoS S oS LoS PLoS BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 INITCAP NODIGIT 0 0 0 0 0 0 0 0 NOPUNCT 0 0 1 0 <reference>
ONE one O ON ONE ONE E NE ONE ONE BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 0 0 1 0 0 0 0 0 NOPUNCT 0 0 1 0 <reference>
17 17 1 17 17 17 7 17 17 17 BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS ALLDIGIT 0 0 0 0 0 0 0 0 NOPUNCT 0 0 1 0 <reference>
( ( ( ( ( ( ( ( ( ( BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 OPENBRACKET 0 0 1 0 <reference>
1 1 1 1 1 1 1 1 1 1 BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS ALLDIGIT 1 0 0 0 0 1 0 0 NOPUNCT 0 0 1 0 <reference>
) ) ) ) ) ) ) ) ) ) BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 ENDBRACKET 0 0 1 0 <reference>
: : : : : : : : : : BLOCKIN LINEEND ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 PUNCT 0 0 1 0 <reference>
e0263302 e0263302 e e0 e02 e026 2 02 302 3302 BLOCKIN LINESTART ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS CONTAINSDIGITS 0 0 0 1 0 0 0 0 NOPUNCT 0 0 1 0 I-<pubnum>
. . . . . . . . . . BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 DOT 0 0 1 0 <pubnum>
https https h ht htt http s ps tps ttps BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS NODIGIT 0 0 0 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
: : : : : : : : : : BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 PUNCT 0 0 1 0 <pubnum>
/ / / / / / / / / / BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
/ / / / / / / / / / BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
doi doi d do doi doi i oi doi doi BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS NODIGIT 0 1 0 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
. . . . . . . . . . BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 DOT 0 0 1 0 <pubnum>
org org o or org org g rg org org BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS NODIGIT 0 0 0 0 0 1 0 1 NOPUNCT 0 0 1 0 <pubnum>
/ / / / / / / / / / BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
10 10 1 10 10 10 0 10 10 10 BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS ALLDIGIT 0 0 0 0 0 1 0 1 NOPUNCT 0 0 1 0 <pubnum>
. . . . . . . . . . BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 DOT 0 0 1 0 <pubnum>
1371 1371 1 13 137 1371 1 71 371 1371 BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS ALLDIGIT 0 0 0 1 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
/ / / / / / / / / / BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
journal journal j jo jou jour l al nal rnal BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS NODIGIT 0 0 1 0 0 0 0 1 NOPUNCT 0 0 1 0 <pubnum>
. . . . . . . . . . BLOCKIN LINEEND ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 DOT 0 0 1 0 <pubnum>
pone pone p po pon pone e ne one pone BLOCKIN LINESTART ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS NODIGIT 0 0 0 0 0 0 0 0 NOPUNCT 0 0 1 0 <pubnum>
. . . . . . . . . . BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 DOT 0 0 1 0 <pubnum>
0263302 0263302 0 02 026 0263 2 02 302 3302 BLOCKEND LINEEND ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 0 0 NOCAPS ALLDIGIT 0 0 0 1 0 0 0 0 NOPUNCT 0 0 1 0 <pubnum>
Editor editor E Ed Edi Edit r or tor itor BLOCKSTART LINESTART ALIGNEDLEFT NEWFONT SAMEFONTSIZE 1 0 INITCAP NODIGIT 0 0 1 0 0 0 0 0 NOPUNCT 0 0 1 0 I-<other>
: : : : : : : : : : BLOCKIN LINEIN ALIGNEDLEFT SAMEFONT SAMEFONTSIZE 1 0 ALLCAP NODIGIT 1 0 0 0 0 0 0 0 PUNCT 0 0 1 0 <other>
Primo primo P Pr Pri Prim o mo imo rimo BLOCKIN LINEIN ALIGNEDLEFT NEWFONT SAMEFONTSIZE 0 0 INITCAP NODIGIT 0 0 0 0 0 0 0 0 NOPUNCT 0 0 1 0 I-<editor>
Regarding header/metadata, the following PLOS article is correctly processed with processFulltextDocument service (correct DOI, journal, etc.)
However in case of processHeaderDocument, the wrong DOI is selected (the one for the data at Zenodo), despite correct title and first author.
journal.pone.0263302.pdf
thanks @Aazhar for the error case