Closed jakelever closed 1 year ago
Ya that seems odd. I'll take a look!
Sorry for the long delay, this completely fell off my radar. I've been debugging this and it looks like the reason this happens is a case where the citation is the first thing in a passage. Now since we remove the in-text citation and attribute it to the text that precedes it this ends up putting the annotation in the wrong passage. Should be an easy fix but first I'd like to see why those passages are being split. Seems like maybe they shouldn't be
Ok this one is really weird.... the citation is actually in a strange position in the original text. Would it make sense to be adjusting the position so the reference is the start of the table header passage or should we just append it to the previous passage after the table description?
Wow, what a weird one. Some bug in the publishers' code to convert to PMC XML. It's probably just better to work with the data that we've got instead of trying fixes that may sometimes work. So I guess insert it into the table header? What'd you think?
ya, that's probably the simplest solution
Hey @creisle , I've come across a citation annotation that is outside the associated passage. One of my scripts checks some things on BioC files and this got flagged. I think that it doesn't seem right. What do you think?
Below is an example where the passage offset is 56733 but the zero-length citation is at offset 56732 which is just before the passage starts.
To reproduce, I've included the source PMC XML file: PMC8466798.xml.gz and I converted it with the line below.