gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

collectionCode false positive #431

Open myrmoteras opened 6 years ago

myrmoteras commented 6 years ago

41313769FFC96D15061DFFAFFFACFF87

this in the treatment of Microstomum laurae

here are two collection codes that cause problems: COI is not a collection code, but a DNA sequence name. This is widely used and might be excluded as collection

SMNH is not discovered, although it is here SMNH-Type-8904 in the MC of Microstomum laurae

gsautter commented 6 years ago

I realize I have to do something about false positives in collection codes.

However, GrBio lists "COI" as the collection code for "University of Coimbra Botany Department", so it's actually validly tagged, and I right now cannot seem to think of a way of dealing with such homononymous acronyms. (Paragraph) context might be viable, checking for coordinate pairs or specimen counts, and checking for collection code derived specimen codes might be another. But this won't be an easy one to resolve.

"SMNH", on the other hand, is ambiguous in GrBio, mapping to "Saskatchewan Museum of Natural History", "Schmidt Museum of Natural History, Emporia State University", and "Swedish Museum of Natural History". Hard to tell what to do about this. If thesaurus lookup comes back ambiguous, what to do? Prompting won't work in batch mode, I'm afraid ...