Open myrmoteras opened 5 years ago
A CoL lookup for family "Reduviidae" gives a pretty good idea why we're getting this result: http://www.catalogueoflife.org/col/webservice?response=full&name=Reduviidae
Now if you go through the child genera (under <child_taxa>
), you see that most of the missed genera are not listed there.
This is a hard one with plain binomials only (i.e., no subspecies or variety whose most significant epithet would bear an explicit label, "subsp." or "var."), and no original descriptions or recombinations whose taxon status labels would provide any sure-fire hints. In brief, there is preciously little to go at in terms of recovering taxon names not vetted by the catalogs.
Using the italics alone, maybe in combination with the presence of fairly regular authorities might work in this case, but would likely incur an enormous number of false positives in many other articles, especially ones richer in data per taxon. I have to think about how we could attack this.
I've been running FAT with the fixes from #541 on this checklist several times now, and it looks like it's working just fine. All the taxa come up, all the families as well, and also most of the subfamilies and tribes. Either CoL added a major update in the past couple of days, or it just works now.
Do you have more of these CheckList articles to run a few more tests with? And anyway, since this is a Pensoft journal, shouldn't we be harvesting it as TaxPub?
Checklist is PDF only. No taxpub. You can get any of the checklist articles from the checklist journal site at Pensoft. It makes more sense though right now to use EJT articles with botanical content.
OK, fair enough. However, as stated in #541 , it looks as though FAT handles botanical names pretty well now.
The specific challenge with checklists (in general) is that they barely contain anything that helps FAT detect formerly unknown (to CoL, IPNI, and GBIF that is) names: No "new
I'm aware this is none of our primary concerns right now, so I won't bother for the time being. However, in the long haul checklists are a treasure trove of non-cataloged names and occurrence data alike, so we might want to keep this in mind for later ... there is a lot of data (and references to original descriptions) to be harvested from checklists.
One more question: How common, if at all, are checklists outside zoology?
I see floras are kind of checklists as well (in their summarizing and subsuming nature), but way more detailed, and less restricted to "Aus bus" binomials only (i.e., richer in the explicit taxon name clues listed in my previous post). But this question is more in terms of potentially applicable zoology specific filters, e.g ones removing false positives like "Parana Forest" (with "Forest" as the authority and "Parana" being a valid Hymenoptera genus whose parent family Braconidae is even mentioned explicitly in the article, see http://www.catalogueoflife.org/col/webservice?response=full&name=Parana) from the example checklist of this issue.
CheckList_article_21049
I fixed all of them manually
n this article, many taxonomic names have not bee properly detected
and when extending the names and run pars taxon name, the result is not complet