hurlbertlab / dietdatabase

Creative Commons Zero v1.0 Universal
10 stars 9 forks source link

name correction suggestion for [Myctera americana] #44

Closed jhpoelen closed 7 years ago

jhpoelen commented 7 years ago

Hi! I found a taxon name Myctera americana at a globi page. After reviewing the data sources and relevant taxonomic tools like the globalnames resolver, I think that suggestion below might be helpful.

Thanks!

name suggested name name url reference notes
Myctera americana Mycteria americana http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=174897 some notes

@ahhurlbert I've made a little widget that create an issue template (see above) after following status page > click on name match percentage > click on "suggest correction" . Useful?

ahhurlbert commented 7 years ago

If I understand you correctly, the widget will automatically post a new issue for each problematic name that is identified?

But I don't understand what you're referring to with "status page", and hence the subsequent link trail (the 'globi page' linked to above does not work, is that what you mean?).

So perhaps this could be quite useful but I need some clarification.

At the moment, our current workflow is to get the file of unmatched names from GloBI where the suggested names are also listed. See instructions here. We're just a little behind, but all names will get cleaned eventually!

jhpoelen commented 7 years ago

I think I was a little too quick to share this idea with you. First, I corrected to link to the "globi page" (also here).

Second, the semi-automated workflow would be something like:

  1. user goes to the status page at http://globalbioticinteractions.org/status
  2. user locates dataset of interest (e.g. screenshot from 2017-02-10 15-21-05)
  3. user now click on the name match percentage (e.g. screenshot from 2017-02-10 15-14-38)
  4. now, a page is loaded that shows records with unmatched names for the specific dataset (see screenshot from 2017-02-10 15-17-34)
  5. for each unmatched name a "suggest correction" link is visible
  6. after clicking on the "suggestion correction" link, a pre-populated github issue is shown (see screenshot from 2017-02-10 15-32-28)
  7. user now fills in the generated template and creates the issue.

So, this work flow is semi-automatic and is designed to help folks make suggestion to the curators of a source dataset (in this case the dataset at https://github.com/hurlbertlab/dietdatabase).

Please let me know if this description is clear and whether it would help you with name curation.

Thanks for being patient!

ahhurlbert commented 7 years ago

This seems useful, however it seems you have already written code to suggest a specific name (as you list in the taxonUnmatched.tsv file), so perhaps you could autofill this suggestion in the issue text when available?

Of course, it is incumbent on the reviewer to verify this suggestion, but it saves them several steps not having to start from scratch.

ahhurlbert commented 7 years ago

I finally embraced programmatic name cleaning in our own way using R's taxize package. Waiting until the database gets a little more stable before running it, but we will soon be primarily ITIS-conforming (see comments under #48 for more details).