ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

Would be nice if CoordinateCleaner::countryref had sq-km column. #13

Closed jhnwllr closed 5 years ago

jhnwllr commented 5 years ago

I think the CoordinateCleaner::countryref table needs a sq-km column.

If each country or administrative zone had a total area in like square kilometers, it would really help improve the filtering.

It is possible to download admin zones from this place:

https://gadm.org/download_country_v3.html

Then it might be possible to generate a simple column with country areas so that an individual could only filter out countries or admin zones that have and area below a certain resolution. I would be willing to do this and make a PR if it is not already being done.

azizka commented 5 years ago

Hi,

yes please go ahead and do this. I suggest using the www.naturalearthdata.com, rather than gadm to be consistent.

jhnwllr commented 5 years ago

@azizka ok will work from naturalearth data.

jhnwllr commented 5 years ago

https://github.com/ropensci/CoordinateCleaner/pull/14

jhnwllr commented 5 years ago

I am thinking that cc_cen should now have a way to exclude small regions.

I think something like this would work?

    if (!is.null(min_area)) { 
      # remove missing and too small 
      ref <- ref[ref$area_sqkm > min_area & !is.na(ref$area_sqkm), ] 
    }

Where min_area is a new argument defining areas too small to include in the centroid filtering. This way we could exclude places like Monoco.

monoco

Where even a 2km centroid buffer covers the whole country and even bleeds a little into France. I can make a pull request with this change.

jhnwllr commented 5 years ago

https://github.com/ropensci/CoordinateCleaner/pull/17

azizka commented 5 years ago

I'll close this issue now