traitecoevo / taxonlookup

A versioned and dynamically updating taxonomic lookup table for land plants
http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12517/abstract
Other
31 stars 6 forks source link

Treatment of unresolved TPL names #19

Closed rossmounce closed 8 years ago

rossmounce commented 8 years ago

I just noticed many TPL unresolved names don't seem to get looked up.

e.g. Carpoceras tatianae (Bordz.) Grossh. http://www.theplantlist.org/tpl1.1/record/kew-2701287

Is there anyway to force it to return the higher-level groups? I have a lot of unresolved names. I would expect this to be returned: "Martyniaceae" "Lamiales" "Angiosperms"

> plant_lookup_version_current()
[1] "1.0.1"
> lookup_table("Carpoceras tatianae",by_species=TRUE)
[1] genus  family order  group 
<0 rows> (or 0-length row.names)

I could give you another 200 names like that if you want, but the basic commonality seems to be that they are all TPL "Unresolved"

wcornwell commented 8 years ago

Yes, right now it only does accepted names, but shouldn't be too hard to add this. Will have a look...

wcornwell commented 8 years ago

Added the unresolved names

> plant_lookup_version_current()
[1] "1.1.0"
> lookup_table("Carpoceras tatianae",by_species=TRUE)
                    number.of.accepted.species
Carpoceras tatianae                          0
                    number.of.accepted.and.unresolved.species
                                        5
                         genus       family    order       group
               Carpoceras Martyniaceae Lamiales Angiosperms

I think this broke some things though as apparently the same "unresolved" genera are listed in multiple families. Might take a little while to figure out how to fix that one.

wcornwell commented 8 years ago

for example Acanthodium is an accepted name in the Sematophyllaceae and an unresolved name in the Acanthaceae. hmmm

rossmounce commented 8 years ago

Perhaps this should be implemented as an optional parameter for the user to toggle on/off e.g. include_unresolved=TRUE

FWIW here's my use case: I've used taxonlookup to check all >20,000 IUCN-assessed plant species (I ran them first through Taxonstand first, heck of a lot of misspelt/synonym/invalid/'too new' names!). I want to know what the 'coverage' is like % species per genera. Taxonlookup numbers seem very conservative.

For instance Taraxacum, TPL lists 2336 accepted names & 551 unresolved http://www.theplantlist.org/tpl1.1/search?q=taraxacum

Yet taxonlookup version 1.0.1 returns a total species number of 2332 for this same genera. Presumably taxonlookup is deferring to APweb here? [Just want to clarify the origin of the numbers I'm getting!]

wcornwell commented 8 years ago

sounds like a cool project.

for the species in genera counts, we only use TPL. If you look here: http://www.theplantlist.org/1.1/browse/A/Compositae/Taraxacum/

There are 2332 "accepted species" and 2336 "accepted names". So there are 4 "accepted" subspecies or varieties or something. Taxonlookup drops the subspecific names (as the use of these is so variably across plant groups), so I think that's the discrepancy you're getting.

rossmounce commented 8 years ago

aaah... my bad. I was just ctrl-F'ing the page. That makes a lot more sense now

wcornwell commented 8 years ago

I think it's working now

> pl <- plant_lookup("1.1.1")
> plant_lookup_version_current()
[1] "1.1.1"
> lookup_table("Carpoceras tatianae",by_species=TRUE)
                         genus       family    order       group
Carpoceras tatianae Carpoceras Martyniaceae Lamiales Angiosperms

As of now the unresolved genera that are listed in >1 family won't be returned. Hopefully plant list v1.2 fixes that bug. Let me know if that works for you.

wcornwell commented 8 years ago

Also if you want species in genus counts, this is the best way to get them:

> pl<-plant_lookup(include_counts = TRUE)
> head(pl)
  number.of.accepted.species number.of.accepted.and.unresolved.species     genus
1                          2                                         5    Acorus
2                          1                                         1 Albidella
3                          8                                        16    Alisma
4                          1                                         1   Astonia
5                          3                                         5 Baldellia
6                          1                                         1  Burnatia
        family       order       group
1    Acoraceae    Acorales Angiosperms
2 Alismataceae Alismatales Angiosperms
3 Alismataceae Alismatales Angiosperms
4 Alismataceae Alismatales Angiosperms
5 Alismataceae Alismatales Angiosperms
6 Alismataceae Alismatales Angiosperms