ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

cc_outl returns error when rownames are not in order #24

Closed jivelasquezt closed 4 years ago

jivelasquezt commented 5 years ago

When running clean_coordinates, the cc_outl subroutine returns the following error if row names values are greater than the number of rows in the x data frame. This is the error:

Testing geographic outliers
Flagged 1260 records.
Error in `$<-.data.frame`(`*tmp*`, "otl", value = c(TRUE, TRUE, TRUE,  : 
  replacement has 78197 rows, data has 68115

Further, it seems like this function uses the row names in x to return the indices of flagged records. Thus if row names are not in order but are less than the number of rows in x, you won't get an error but you will get an incorrect result. This could be fixed adding rownames(x)<-1:nrow(x) to the cc_outl function.

azizka commented 5 years ago

changed as suggested. Works for me now. Could you provide the test data or test again? Thanks!

jivelasquezt commented 5 years ago

After re-installing using devtools the error persists:

sp.data<-clean_coordinates(sp.data,
                                   lon="decimalLongitude",
                                   lat="decimalLatitude",
                                   species="species",
                                   countries = "countryCode",
                                   value="clean")
#After flagging a number of occurrences
Error in `$<-.data.frame`(`*tmp*`, "otl", value = c(TRUE, TRUE, TRUE,  : 
  replacement has 397115 rows, data has 55319
In addition: Warning message:
In cc_outl(otl_test, lon = lon, lat = lat, species = species, method = outliers_method,  :

Error in `$<-.data.frame`(`*tmp*`, "otl", value = c(TRUE, TRUE, TRUE,  : 
  replacement has 397115 rows, data has 55319 

Sample file attached: test.txt

Farewe commented 5 years ago

I've had the same issue. The workaround of @jivelasquezt (rownames(x)<-1:nrow(x)) saved me!

azizka commented 4 years ago

I included your suggestion now, and reset the rownames in cc_outl and clean_coordinates

castillolab commented 4 years ago

I am still getting this error using v 2.0-15. Is this only implemented in a developmental build?

NaturalSelecta1 commented 4 years ago

Hi, I'm also getting this error with clean_coordinates(), have tried the cran and developer versions of the package, v2.0-15. Not sure where in the function to implement the workaround suggested by @jivelasquezt?

jivelasquezt commented 4 years ago

Hi, not sure if this helps, but this was my workaround with last year's version:

row.names(rv$sp.data) <- 1:nrow(rv$sp.data) try(sp.data.clean <- CoordinateCleaner::clean_coordinates(rv$sp.data, lon="decimalLongitude", lat="decimalLatitude", species="species", countries = "countryCode", value="clean", tests=c("countries","capitals","centroids", "equal", "gbif", "institutions", "outliers", "seas","zeros"))) if(exists("sp.data.clean")){ rv$sp.data <- sp.data.clean rv$logs <-paste(rv$logs, nrow(rv$sp.data), "records remain after running CoordinateCleaner\n") } else { rv$logs <-paste(rv$logs, "CoordinateCleaner failed. Trying now without country test\n") tryCatch({sp.data.clean <- CoordinateCleaner::clean_coordinates(rv$sp.data, lon="decimalLongitude", lat="decimalLatitude", species="species", countries = "countryCode", value="clean", tests=c("capitals","centroids", "equal", "gbif","institutions", "seas"," zeros")) rv$sp.data <- sp.data.clean rv$logs <-paste(rv$logs, nrow(rv$sp.data), "records remain after running CoordinateCleaner\n")}, error = function(e) { rv$logs <-paste(rv$logs, e) rv$logs <-paste(rv$logs, "CoordinateCleaner failed. No data cleaning performed.\n") }) }

On Fri, Jun 12, 2020 at 8:27 AM NaturalSelecta1 notifications@github.com wrote:

Hi, I'm also getting this error with clean_coordinates(), have tried the cran and developer versions of the package, v2.0-15. Not sure where in the function to implement the workaround suggested by @jivelasquezt https://github.com/jivelasquezt?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/CoordinateCleaner/issues/24#issuecomment-643270236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5M3BJQBIA7JRFAXEPM2Z3RWIUM5ANCNFSM4HMUXGXQ .