ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

Filter for known defaults of coordinate uncertainty in meters #55

Open jhnwllr opened 3 years ago

jhnwllr commented 3 years ago

There are several known default values for coordinate uncertainty in meters.

301 : Geolocate Default (often a country centroid) 3036 : Geolocate Default 999 : Default found in a few datasets (observations.org) 9999 : Large default

occurrence counts 630 353 -- 3036m 401 507 -- 301m 370 553 -- 999m 14 242 -- 9999m

I think CoordinateCleaner could have a function for these filtering these known defaults. I would be happy to make a PR for such a function...

https://github.com/gbif/pipelines/issues/417

azizka commented 3 years ago

Hi John,

thanks for the excellent suggestion. I'll implement this for the next version. Two questions:

jhnwllr commented 3 years ago

Thanks!!

I don't have any opinions about individualCount right now.

My assumption would be that there might be some default values there. GBIF has recently done a good job of trying to cleaning up that column. Since GBIF now has the occurrence_status field: https://www.gbif.org/occurrence/search?taxon_key=4689&occurrence_status=present

jhnwllr commented 3 years ago

What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided I would name the issue or column something like "known_default_coordinate_uncertainty"