ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

Running cc_outl on multiple species #16

Closed HMB3 closed 5 years ago

HMB3 commented 5 years ago

Hi,

My name is Hugh and I've got a big data set of occurrences from GBIF and ALA (accidentally closed the last issue).

There are about 3.8k species, so lots of points to clean!

I'm using the CleanCoordinates function with these settings ::

library(CoordinateCleaner) minages <- runif(250, 0, 65) exmpl <- data.frame(species = sample(letters, size = 250, replace = TRUE), decimallongitude = runif(250, min = 42, max = 51), decimallatitude = runif(250, min = -26, max = -11), min_ma = minages, max_ma = minages + runif(250, 0.1, 65), dataset = "clean")

exmpl <- exmpl %>% timetk::tk_tbl()

FLAGS <- CleanCoordinates(exmpl, capitals.rad = 0.12, countrycheck = TRUE, duplicates = TRUE, seas = FALSE, verbose = FALSE)

However, running the spatial outlier detection here is a bit slow, because there are too many records.

So I'm running the outlier detection separately in this form :

SPAT.OUT <- as.character(unique(exmpl$species)) %>%

lapply(function(x) {

f <- subset(exmpl, species == x)

message("Running spatial outlier detection for ", x)
message(dim(f)[1], " records for ", x)

sp.flag <- cc_outl(f,
                   lon     = "decimallongitude",
                   lat     = "decimallatitude",
                   species = "species",
                   method  = "distance",
                   tdi     = 300,  ## get points 300km from other points?
                   value   = "flags",
                   verbose = "FALSE")

d = cbind(searchTaxon = x,
          SPAT_OUT = sp.flag, f)[c("searchTaxon", "SPAT_OUT")]
return(d)

}) %>%

bind_rows

azizka commented 5 years ago

Thanks, I'll close this issue for now.