Closed azizka closed 6 years ago
Thanks @azizka. We will discuss this and get back to you.
hi @azizka - nice to see you here.
As I probably mentioned to you in Stockholm, we do have a overlap policy where we try not to have packages that overlap too much. But here I think we might be okay. What do you think are the areas of overlap for the two packages? Maybe we can just avoid overlapping functionality.
Hi @sckott & @karthik , thanks!
I dug into both packages, after all the overlap seems rather small. See below for a by-function comparison table with scrubr from CRAN (please correct it if I missed something).
In two sentence: The aim of both packages is identical -- to improve quality of occurrence records from large databases, beyond that there is actually little overlap. Few basic functionalities are virtually identical, beyond that scrubr includes date- and taxonomic cleaning, while CoordinateCleaner includes many unique feature for coordinates and fossils, and enables custom gazetteers and custom precision for the match with political centroids and capitals.
Sorry for the slow reply, I was out of office.
Functionality | CoordinateCleaner1.0-7 | scrubr 0.1.1 | Percent overlap |
---|---|---|---|
Missing coordinates | cc_val | coord_incomplete | 100% |
Coordinates outside CRS | cc_val | coord_impossible | 100% |
Duplicated records | cc_dupl | dedup | The aim is identical, methods differ |
0/0 coordinates | cc_zero | coord_unlikely | 100% |
Identical lon/lat | cc_equ | - | 0% |
Country capitals | cc_cap | - | 0% |
Political unit centroids | cc_cen | "not ready yet" | 0% |
Coordinates in-congruent with additional location information | cc_count | coord_within | 100% |
Coordinates assigned to GBIF headquaters | cc_gbif | - | 0% |
Coordinates assigned to the location of biodiversity institutions | cc_inst | - | 0% |
Spatial outliers | cc_outl | - | 0% |
Coordinates within the ocean | cc_sea | - | 0% |
Coordinates in urban area | cc_urb | - | 0% |
Coordinate conversion error | dc_ddmm | - | 0% |
Rounded coordinates/rasterized collection | dc_round | - | 0% |
Fossils: invalid age range | tc_equal | - | 0% |
Fossils: excessive age range | tc_range | - | 0% |
Fossils: temporal outlier | tc_outl | - | 0% |
Fossils: PyRate interface | WritePyrate | - | 0% |
Wrapper functions to run all test | CleanCoordinates, CleanCoordinatesDS, CleanCoordiantesFOS | - | 0% |
Database of biodiversity institutions | institutions | - | 0% |
Taxonomic cleaning | - | tax_no_epithet | 0% |
Missing date | - | date_missing | 0% |
Add date | - | date_create | 0% |
Date format | - | date_standardize | 0% |
thanks for this @azizka - Sorry about delay in responding. We agree that the overlap isn't sufficient to warrant concern. We'd like you to submit the package for review. Open a new issue and fill out the issue template you'll see.
closing this, looking forward to your submission
Hi,
I'd like to submit the CoordinateCleaner package. It's a package for automated cleaning of biological and paleontological collection data, useful for conservation, ecology and evolutionary biology. There is overlap with scrubr, but CoordinateCleaner has a lot of additional functions. So hopefully it could still be appropriate?
In particular the package adds:
CoordianteCleaner is on CRAN and I had discussed it briefly with @sckott when he was in Stockholm in October 2016. There is also a manuscript ready for submission to Methods in Ecology and Evolution linked with the package.
Thanks, Alex
Summary
What does this package do? (explain in 50 words or less): Scan data sets of recent and fossil species occurrence records for geo-referencing and dating imprecision and data-entry errors in a standardized and reproducible way.
Paste the full DESCRIPTION file inside a code block below: