ropensci-archive / scrubr

:warning: ARCHIVED :warning: Clean species occurrence records
Other
34 stars 10 forks source link

use case: real world eg of data cleaning with gbif data #22

Closed sckott closed 2 years ago

sckott commented 7 years ago

from https://doi.org/10.1093/jxb/erw451

Occurrence data for each taxon were downloaded from the Global Biodiversity Information Facility (GBIF, http://www.gbif.org) using the RGBIF package in R (Chamberlain et al., 2016; data accessed 1 and 2 July 2016). Occurrence data for the Zambezian C3–C4 within Alloteropsis semialata were taken from Lundgren et al. (2015, 2016). All occurrence data were cleaned by removing any anomalous lati- tude or longitude points, points falling outside of a landmass, and any points close to GBIF headquarters in Copenhagen, Denmark, which may result from erroneous geolocation. To avoid repeated occurrences, latitude and longitude decimal degree values were rounded to two decimal places, and any duplicates at this resolution were removed. These lters are commonly applied to data extracted from GBIF (Zanne et al., 2014).

SriramRamesh commented 7 years ago

Hi, I am interested in biodiversity data cleaning for GSOC 17. Can I create a function for this use case?

sckott commented 7 years ago

hi @SriramRamesh I wasn't thinking of a separate function for this, but an example to put in a vignette and/or README.

we may want to add additional functions to this pkg if warranted

SriramRamesh commented 7 years ago

Did you mean that we can mention this kind of usage in the README so that users can customize to their dataset?

sckott commented 7 years ago

no.

the idea is to make a vignette like https://github.com/ropensci/scrubr/blob/master/inst/vign/scrubr_vignette.Rmd that has one or more use cases like that described above - so a set of code replicating as close as possible what they did in the paper -

a second issue is in the process of doing that, we may find we need additional functions, which we can talk about if that comes up