Open puggimer opened 6 years ago
Just to add another similar error case.
From the bioRxiv training dataset, 099754v1
(10.1101/099754).
[1] Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan JFrey.Predicting the sequence specificities of dna-and rna-bindingproteins by deep learning.Nature biotechnology, 2015.
<biblStruct xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="b0">
<analytic>
<title level="a" type="main">Predicting the sequence specificities of dna-and rna-binding proteins by deep learning</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Babak</forename><surname>Alipanahi</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Andrew</forename><surname>Delong</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">T</forename><surname>Matthew</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Brendan</forename><forename type="middle">J</forename><surname>Weirauch</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Frey</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">Nature biotechnology</title>
<imprint>
<date type="published" when="2015"/>
</imprint>
</monogr>
<note type="raw_reference">Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology, 2015.</note>
</biblStruct>
MeltdownPrime.pdf Specifically References 13, 14 and 15 in the attached document. Yatin A. Manerkar is parsed as A Yatin, then it incorrectly combines the first name of the next author with the last name of this one - so Daniel Lustig shows up as Daniel Manerkar etc.
This only appears to happen when the first author in the list has a middle initial. The same author is repeated later (in reference 17) but is the second in the list, and it correctly parses the name into first, middle and surname.
To show the full example - here is the reference citation with 4 authors
The generated TEI for it is