gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

taxonomicName recognition does not work well: zootaxa FFC7FF9EFFAAFFE5C215EA26C95D3C78 #419

Open myrmoteras opened 6 years ago

myrmoteras commented 6 years ago

image

how can I fix this?

see also https://github.com/plazi/Plazi-Communications/issues/754 https://github.com/plazi/Plazi-Communications/issues/751

gsautter commented 6 years ago

This looks like a hard one ... I take it the species epithets never ever occur in immediate combination with their associated genus or genera anywhere in the article? As full "Aus bus" style binomials, that is, or at least as "A. bus" with an abbreviated genus ...

If not, the current FAT will have a hard time figuring this one out. So because for the sake of precision, standalone species epithets (found in catalogs or not) are only ever tagged if they have been found in an "Aus bus" or at least "A. bus" style binomial somewhere further up the article. For 25,000+ PDFs this has looked like a pretty much sure-fire filter ... Frankly, I don't like the idea of diluting FAT's thus far pretty reliable accuracy for the sake of catching this kind of species occurrence list by genus, just doesn't feel like making sense in the bigger picture.

One possible way of catching such cases might be creating kind of an FAT supplement that would first extract all the genera found in an article by FAT proper, then get all the catalog listed species for these genera, and finally find and mark the occurrences of the latter throughout the article (where not already marked as part of a full name). This would be a dedicated function in the Tools menu, however, not one run by default as part of the batch. So to not interfere with precision in the general case, but still provide you with automated means of correction in cases you identify the specific need.

gsautter commented 6 years ago

From a different angle, wouldn't each and ever paragraph in this example be a separate treatment, consisting of a taxon name (if one crippled down to the species epithet), a bibliographic citation (doubling as the authority), and some occurrence data and references to collection material?

Would end us up with pretty slim treatments if marked like that, more like individual occurrence records ... maybe better handle it as a single treatment with a lot of occurrences of the heading taxon (subgenus in this case)? Otherwise, we might even end up violating the "one treatment per taxon and article" principle we'e been upholding for some ten years now.

gsautter commented 6 years ago

From yet another angle, this simply looks like a pretty odd and peculiar way or organizing a checklist ... how frequent is this kind of oddjob?