Open jameshowison opened 4 years ago
It just occurred to me that potentially we could recycle the very small portion of annotations under mention_type != "software"
and validate if they are actually software
(given that often these categories were labeled with lower certainty
).
On multiply annotated files, when we did curation we started with the annotations from the annotator with the highest count of annotations ("the top annotator"). In same cases the other annotator may have found mentions not found by the top annotator (even though they found fewer overall). It should be possible to check this as a source of additional mentions. Some of these may be included as elements in the TEI/XML file.
Putting this on the back burner for now as the numbers are likely to be low.