ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

cc_outl update cannot find object 'ras' #21

Closed CharlottePhilps closed 4 years ago

CharlottePhilps commented 5 years ago

Hello,

I am trying to run the cc_outl function on various species. These issues occur for both => 10000 and =<10000 occurrence records.

When I run the following code for => or =< than 10,000 records:

outliers <- cc_outl(cleandat, lon = "decimalLongitude", lat = "decimalLatitude", species = "species", method = "quantile", mltpl = 5, value = "flagged", sampling_thresh = 0, verbose = TRUE)

R returns the following error: Testing geographic outliers Error in ras_dist(x = k, lat = lat, lon = lon, ras = ras, weights = TRUE) : object 'ras' not found I believe 'ras' is an automated object from the function and can only be found when thinning=TRUE.

To not return the 'ras' error, I can set thinning= T even for species <10000 however raster approximation is used:

cleansubset <- cc_outl(cleandat, lon = "decimalLongitude", lat = "decimalLatitude", species = "species", method = "quantile", mltpl = 5, value = "flagged", thinning= T, sampling_thresh = 0, verbose = TRUE) This overcomes the 'ras' error. However, when there are under 10,000 records while setting value of cc_outli to "flagged" and there are no outliers (my occurrence records = 489), there are more logical values returned than records tested (58146 with NA's contributing to this).

When using value= "clean" by setting thinning to TRUE I can also bypass the 'ras' error using thinning = TRUE. cleansubset <- cc_outl(cleandat, lon = "decimalLongitude", lat = "decimalLatitude", species = "species", method = "quantile", mltpl = 5, value = "clean", thinning= T, sampling_thresh = 0, verbose = TRUE)

Testing geographic outliers Removed NA records. Warning message: In cc_outl(cleandat, lon = "decimalLongitude", lat = "decimalLatitude", : Using raster approximation.

Furthermore, when reading the reference manual the example code for cc_outl has thinning = FALSE. However, when reading the arguments, default thinning = TRUE.

Is there a way to overcome this?

Thank you for your help. And sorry for any incorrect formatting, this is my first post.

azizka commented 5 years ago

Hi,

  1. I couldn't reproduce the error with the latest version and the example data. Could you post some example data that produces the error, and which version of CoordinateCleaner you are using (sessionInfo())?

  2. The information in ?cc_outl is now consistent. Thanks for pointing that out.

Thanks for posting!

CharlottePhilps commented 5 years ago

Hello,

Thank you for your response!

Im using CoordinateCleaner_2.0-11 under R version 3.5.3 (2019-03-11).

I have attached a .txt file of some species occurrence records:

Allium_atroviolaceum_data.txt

Thank you again, Charlotte

azizka commented 5 years ago

Hi again, the example data seems corrupted. Can you provide a version using tab or semicolon as a delimiter? Thanks!

CharlottePhilps commented 5 years ago

Hello again,

I have now resaved the data with sep= ";" So hopefully it is okay, thank you for your help!

Charlotte Allium_atroviolaceum_data2.txt

azizka commented 5 years ago

Thanks, this works fine for me on CoordinateCleaner_2.0-12.

library(tidyverse)
library(CoordinateCleaner)

cleandat <- read_delim("Allium_atroviolaceum_data2.txt", delim = ";")

outliers <- cc_outl(cleandat, lon = "decimalLongitude", lat = "decimalLatitude",
                    species = "species", method = "quantile", mltpl = 5,
                    value = "flagged", sampling_thresh = 0, verbose = TRUE)

cleansubset <- cc_outl(cleandat, lon = "decimalLongitude", 
                       lat = "decimalLatitude", species = "species", method = "quantile", 
                       mltpl = 5, value = "flagged", thinning= T, sampling_thresh = 0, verbose = TRUE)

Please try to update devtools::install_github("ropensci/CoordinateCleaner") and run cc_outl again.

CharlottePhilps commented 5 years ago

Dear Alex,

Unfortunately, when I try and update the CoordinateCleaner package it will not download. The update has not occurred in CRAN so when trying to install like this through devtools::install_github("ropensci/CoordinateCleaner")

I get the following warning messages:

Warning messages: 1: In untar2(tarfile, files, list, exdir) : skipping pax global extended headers 2: In untar2(tarfile, files, list, exdir) : skipping pax global extended headers

Then when I try and load the package, I get the following warning: library(CoordinateCleaner) Error: package or namespace load failed for ‘CoordinateCleaner’ in get(method, envir = home): lazy-load database 'C:/Program Files/R/R-3.5.3/library/CoordinateCleaner/R/CoordinateCleaner.rdb' is corrupt In addition: Warning messages: 1: In .registerS3method(fin[i, 1], fin[i, 2], fin[i, 3], fin[i, 4], : restarting interrupted promise evaluation 2: In get(method, envir = home) : restarting interrupted promise evaluation 3: In get(method, envir = home) : internal error -3 in R_decompress1

I might be doing something wrong or just need to wait for an update? Thank you again for your help.

CharlottePhilps commented 5 years ago

Dear Alex,

Sorry for all my messages, I thought I would update you as I feel the last issue may have been a problem my end.

I have now managed to download CoordinateCleaner through github.

After download I receive the following warning message: Warning messages: 1: In if (!is.character(what) || is.na(what) || length(what) != 1L || : closing unused connection 4 (ropensci-CoordinateCleaner-f1ef2cf/vignettes/inst/species_20025.shp) 2: In untar2(tarfile, files, list, exdir) : skipping pax global extended headers 3: In untar2(tarfile, files, list, exdir) : skipping pax global extended headers

I was unsure if this is an issue.

Although when I run the cc_outl code, "ras" is still not found:

outliers <- cc_outl(cleandat, lon = "decimalLongitude", lat = "decimalLatitude",

  • species = "species", method = "quantile", mltpl = 5,
  • value = "flagged", sampling_thresh = 0, verbose = TRUE) Testing geographic outliers Error in ras_dist(x = k, lat = lat, lon = lon, ras = ras, weights = TRUE) : object 'ras' not found

Sorry and thank you. Charlotte

shawnlaffan commented 5 years ago

I just hit the same issue with CoordinateCleaner downloaded from CRAN (2.0-11).

It looks like there are issues with the logic where the ras object is created only if(any(test >= 10000) | thinning), but when the code reaches the else block at line 162 then both branches in the subcondition need it to exist.

I have not yet stepped through the code to see what values thinning is getting when this happens, or how many records are being processed. I'll update when I am able to.

https://github.com/ropensci/CoordinateCleaner/blob/9fc948a92abb8d72d10a7ea814111826e609ef9c/R/cc_outl.R#L153-L198

azizka commented 5 years ago

Hi Shawn, thanks for weighing in,

I adjusted the conditions. Could one of you guys try again with the latest version (2.0-13)?

It is working for me.

shawnlaffan commented 5 years ago

Thanks Alex, that appears to work for me.

One minor nit is that a flag variable would be simpler instead of using the same moderately complex condition twice.

use_raster <- FALSE
if (some_condition) {
  use_raster <- TRUE
}

#  then later
if (use_raster) {
  #...
}
shawnlaffan commented 5 years ago

I then get an error, but assume it is due to other reasons. I'll file a separate issue if I can replicate it.

(edited after posting to show more commands for context)

> ala$dataset = ala$species
> rl <- clean_dataset(x=ala, lon="longitude", lat="latitude")
Testing for dd.mm to dd.dd conversion errors
Flagged 0 records
Testing for rasterized collection
Error in rbind.data.frame(list(dataset = 1L, lon.n.outliers = 8L, lon.n.regular.outliers = 2L,  : 
  numbers of columns of arguments do not match
In addition: There were 16 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
2: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
3: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
4: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
5: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
6: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
7: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
8: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
9: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
10: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
11: In FUN(X[[i]], ...) : Geographic span too small, check 'min_span'
12: In FUN(X[[i]], ...) : Dataset smaller than minimum test size
13: In FUN(X[[i]], ...) : Dataset smaller than minimum test size
14: In FUN(X[[i]], ...) : Dataset smaller than minimum test size
15: In FUN(X[[i]], ...) : Dataset smaller than minimum test size
16: In FUN(X[[i]], ...) : Dataset smaller than minimum test size
barnabywalker commented 5 years ago

Hi,

I installed the latest package from github today using: devtools::install_github("ropensci/CoordinateCleaner")

and I'm getting this error for a set of records with more than 10000 occurrences for a species.

It looks like the problem might be in the fix made to the condition that creates/uses the raster:

https://github.com/ropensci/CoordinateCleaner/blob/0d02b8aab9d64206d9d4f57cb4e1868fb6551947/R/cc_outl.R#L137-L153

where the raster is created if there are more than 10000 records that pass the test, but the raster is only used below if there are more than 10000 records for a particular species.

In my case there are fewer than 10000 records for the species in the test, but more than 10000 records in total for the species. So the raster is not created but the code in the flagging block tries to uses it. If that makes sense.

The output of packageVersion("CoordinateCleaner") is: [1] ‘2.0.12’

so maybe I'm not using the most up to date version? If not, is there a way to install a more up to date version?

Apologies if this is something that's already been sorted, Barnaby

azizka commented 4 years ago

Included the suggestions by @shawnlaffan and @barnabywalker in version 2.0-14. Thanks.