ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 40 forks source link

Have findSpecies() match subspecies #26

Closed CottonRockwood closed 9 years ago

CottonRockwood commented 10 years ago

Hi Carl- First, thanks for your work on this package... it is VERY useful and I think it will have some far-reaching impacts on fisheries research once it gains some traction and even more so once the API is in place. This is probably more of a request than an issue. I have a data set that includes a few species which return no records for findSpecies(). However, these species DO have subspecies entries in the Fishbase database. It seems that this could be confusing for anyone that expected all species that exist in Fishbase to be returned. I'm wondering if you can have findSpecies() return all subspecies as TRUE or (even better) add an argument that allows you to choose to include them or not. As an example: sum(findSpecies("Clupea pallasii pallasii")) [1] 1 sum(findSpecies("Clupea pallasii")) [1] 0

If the matching is done internally using a regular expression, it might be as simple as adding a wildcard to the search terms you give findSpecies(), but there would need to be documentation for that. Again, thanks for making this package a reality! Also, I'm not an R guru by any means, but let me know if there is anything that I can do to help with the package! -Cotton

cboettig commented 9 years ago

The FishBase database appears to list species+subspecies as if it were all part of the species name; rather than handle subspecies directly.

rfishbase2.0 (in development) implements the synonyms table (e.g. see: http://www.fishbase.us/Nomenclature/SynonymsList.php?ID=1520&SynCode=153539&GenusName=Clupea&SpeciesName=pallasii+pallasii)

e.g.

synonyms(1520)
Source: local data frame [10 x 11]

        SynGenus        SynSpecies Valid Misspelling        Synonymy          Combination SpecCode SynCode CoL_ID    TSN WoRMS_ID
1         Clupea          harengus FALSE       FALSE misapplied name           misapplied     1520   54334     NA     NA       NA
2         Clupea  harengus pallasi FALSE       FALSE  senior synonym       change in rank     1520   55595     NA     NA   322672
3         Clupea harengus pallasii FALSE       FALSE  senior synonym       change in rank     1520     364     NA 161723   322673
4         Clupea           inermis FALSE       FALSE  junior synonym original combination     1520     370     NA     NA   300422
5         Clupea         lineolata FALSE       FALSE    questionable original combination     1520      12     NA     NA   300436
6         Clupea         mirabilis FALSE       FALSE  junior synonym original combination     1520     390     NA     NA   300459
7         Clupea           pallasi FALSE        TRUE  senior synonym original combination     1520   53906     NA 646471   400014
8         Clupea          pallasii FALSE       FALSE  senior synonym original combination     1520   29848     NA 551209   151159
9         Clupea pallasii pallasii  TRUE       FALSE  senior synonym       change in rank     1520  153539     NA     NA   293568
10 Spratelloides         bryoporus FALSE       FALSE  junior synonym original combination     1520    1272     NA     NA   300573
> 

and you'll see only Clupea pallasii pallasii is listed as "valid". To make this user-friendly, I've also added the function validate_names() to rfishbase2.0, so a user can just do:

x <- validate_names("Clupea pallasii")

and get back the name that the FishBase API recognizes as the valid name:

> x
           Clupea pallasii 
"Clupea pallasii pallasii" 

(Most functions in rfishbase2.0 use these validated names to return information, e.g. ecology(x), species_info(x). All functions can take a vector of many species names instead of just one shown here)

rfishbase2.0 isn't out yet but feel free to try the development version from the branch of that name.