ropensci / traits

R package for accessing species trait data from multiple databases
Other
40 stars 13 forks source link

create is.not.native() function based on native plant lists. #24

Closed ibartomeus closed 9 years ago

ibartomeus commented 9 years ago

Lists of "invasive" species are not only hard to get, but too dynamic to maintain. Hence the approach of is.invasive() fails for more research questions due to data availability. This is not going to be fixed anytime soon.

For most research questions the reverse question is also suited. is.not.native(x, region = "Spain") based on comparing your species pool with the list of native plants of the area of interest has several advantages.

1) Lists of natives species are more or less fixed and need no maintenance. 2) It can be applied to several taxa and regions and those can be added gradually as lists come in. 3) It conveys the message that any non native species is potentially harmful, rather than restricting to "worst invaders" like in GISD

The function is straight forward and should 1) check name validity with taxize, 2) check against the appropriate list and 3) return a TRUE/FALSE vector regarding its presence on the list.

The problems of implementing are: 1) finding the native species lists (e.g. fauna europaea (http://www.faunaeur.org/distribution.php), national lists, bonap (http://www.bonap.org/). Important to have the list of native plants only, and not occurences.

2) storing the lists (in a data R package, or accesing each list owner directly?)

If we can sort out how to deal with this 2 questions for let's say Plants and US, EU regions (i think is a good place to start) I can fork-code-pull a first attempt.

sckott commented 9 years ago

this sounds great @ibartomeus

1) finding the native species lists

I know some of the sources in taxize have nativity http://www.itis.gov/ws_tsnApiDescription.html#getJurisdictionalOrigin and related function in taxize: https://github.com/ropensci/taxize/blob/master/R/itis_native.R

2) storing the lists (in a data R package, or accesing each list owner directly?)

Storing them in this or another package sounds good if the data is small enough and the data doesn't change that often - it's hard to get frequent changes of a pkg to CRAN b/c of Brian Ripley

That'd be great if you want to start this

karthik commented 9 years ago

storing the lists (in a data R package, or accesing each list owner directly?)

I'll start a new repo with the notes we discussed on this. And then we can decide how to take that challenge on separately.

ibartomeus commented 9 years ago

Data available:

ITIS. Only covers North America and lumps all continental US. not sure if for all taxa, but plants and the animals I tried has it, itis_native() function already does more or less the job for that.

USDA plants has nice state checklist for the US (includes invasive status). Those can be exported and stored somewhere else. The pain is updating it whenever USDA PLANTS updates its database, which I don't know if happens often. Thoughts?

EU plant distribution per country is more or less accesible via Flora Europaea. BUT they also list exotics. Plus, you can't access the raw data, but queries are posible to scrap. URL's of the form: http://rbg-web2.rbge.org.uk/cgi-bin/nph-readbtree.pl/feout?FAMILY_XREF=&GENUS_XREF=Lavandula&SPECIES_XREF=stoechas&TAXON_NAME_XREF=&RANK= where 'Lavandula' and 'stoechas' are genus and species that can be replaced.

We have nothing for EU yet. Going country by country can be to much work.

On Tue, Mar 24, 2015 at 5:54 PM, Karthik Ram notifications@github.com wrote:

storing the lists (in a data R package, or accesing each list owner directly?)

I'll start a new repo with the notes we discussed on this. And then we can decide how to take that challenge on separately.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/traits/issues/24#issuecomment-85596136.

Ignasi Bartomeus PhD www.bartomeuslab.com @ibartomeus Skype: nachobartomeus

Dpto. Ecología Integrativa Estación Biológica de Doñana (EBD-CSIC) Avda. Américo Vespucio s/n Isla de la Cartuja 41092, Sevilla (Spain)

sckott commented 9 years ago

USDA said they were going to put out an API, but it's been so long now, I doubt it will ever happen. So i imagine you want to simply scrape their website each time it's needed

The Flora Europaea solution sounds good. If I think over other data sources

ibartomeus commented 9 years ago

After playing a bit I learnt:

In Summary: The code is probably not beautiful, but is functional. It does the job for EU and America to tell you species status.

sckott commented 9 years ago

hi @ibartomeus - thanks for the summary. So you will send a PR?

ibartomeus commented 9 years ago

Done! Happy to chat about anything that needs clarification.

On Thu, May 7, 2015 at 11:53 PM, Scott Chamberlain <notifications@github.com

wrote:

hi @ibartomeus https://github.com/ibartomeus - thanks for the summary. So you will send a PR?

— Reply to this email directly or view it on GitHub https://github.com/ropensci/traits/issues/24#issuecomment-100031276.

Ignasi Bartomeus PhD www.bartomeuslab.com @ibartomeus Skype: nachobartomeus

Dpto. Ecología Integrativa Estación Biológica de Doñana (EBD-CSIC) Avda. Américo Vespucio s/n Isla de la Cartuja 41092, Sevilla (Spain)

sckott commented 9 years ago

thanks!

dlebauer commented 9 years ago

@ibartomeus @sckott We have the usda plants database available at betydb.org/species.json if you want to give it a try. Note that we have added a few columns and rows.

Also note that the usda plants database has lots of columns that can be dropped / excluded from the API to limit size (and most columns could be converted from text to Boolean or enum data types to save substantial space)

sckott commented 9 years ago

nice, thanks @dlebauer - USDA said they'd come out with some APIs, but no dice yet

dlebauer commented 9 years ago

@sckott usda plants is pretty much a flat table. We can import updates as necessary, and voila, an API