howisonlab / softcite-dataset

A gold-standard dataset of software mentions in research publications.
32 stars 50 forks source link

Possible source of additional annotations in the RDF file #671

Open jameshowison opened 4 years ago

jameshowison commented 4 years ago

On multiply annotated files, when we did curation we started with the annotations from the annotator with the highest count of annotations ("the top annotator"). In same cases the other annotator may have found mentions not found by the top annotator (even though they found fewer overall). It should be possible to check this as a source of additional mentions. Some of these may be included as elements in the TEI/XML file.

Putting this on the back burner for now as the numbers are likely to be low.

caifand commented 4 years ago

It just occurred to me that potentially we could recycle the very small portion of annotations under mention_type != "software" and validate if they are actually software (given that often these categories were labeled with lower certainty).