ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

Quantile test fails when all lat/lon coordiates are duplicates #9

Closed CyanBC closed 5 years ago

CyanBC commented 5 years ago

When running cc_outl with method=quantile:

if a species has all duplicate coordinate points (all coordinates are the same), the program will crash with error:

Error in quantile.default(as.numeric(x), c(0.25, 0.75), na.rm = na.rm, : missing values and NaN's not allowed if 'na.rm' is FALSE

azizka commented 5 years ago

Hi,

I couldn't reproduce this, this seems to work:

library(CoordianteCleaner)
test <- data.frame(longitude=c(0,0,0,0),latitude=c(0,0,0,0),
                 species=c('t1','t1','t1','t1'))
cc_outl(test, method='quantile', value = 'flagged')

Can you please provide an example?

Thanks!

CyanBC commented 5 years ago

Ok, for some reason I only run into this error when I have the same lat/lon for a given species.

The error seems to be in the structure. Note that, the structuring works for all other records, that do not have repeating lat/lon, and for several records which do.

This works:

test = data.frame(valid_species_name=c("Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name"),
                        longitude=c(-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556),
                        latitude=c(10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944))
      cc_outl(test, lon='longitude', lat='latitude', species='valid_species_name',
              method = 'quantile', mltpl=10, value='flagged', verbose=F)

This does not:

test = data.frame(rec_acc_number=c("Rec_258455","Rec_091939","Rec_125890","Rec_125891","Rec_258422","Rec_258423","Rec_258424","Rec_258456","Rec_258425","Rec_258457","Rec_258458","Rec_104365","Rec_257096"),
                        valid_species_name=c("Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name","Species.name"),
                        ISO3=c("VEN","VEN","VEN","VEN","VEN","VEN","VEN","VEN","VEN","VEN","VEN","VEN","VEN"),
                        longitude=c(-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556,-67.68556),
                        latitude=c(10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944,10.34944))
      cc_outl(test, lon='longitude', lat='latitude', species='valid_species_name',
              method = 'quantile', mltpl=10, value='flagged', verbose=F)

It seems to be related to the surplus columns, yet this is fine:

test = data.frame(rec_acc_number='Rec_130858', valid_species_name='Species.name', ISO3='VEN', longitude=-67.68556, latitude=10.34944)
      test = test[rep(row.names(test), 12), 1:5]
      cc_outl(test, lon='longitude', lat='latitude', species='valid_species_name', method='quantile', mltpl=10, value = 'flagged')
azizka commented 5 years ago

The function should not drop an error anymore with the latest version, but the less than seven records warning.

Thanks for identifying this. Note, however, that the species is not outlier-tested because of the duplicated occurrences.

Cheers,

Alex

azizka commented 5 years ago

I'll close this now