ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Pre-submission inquiry: CoordinateCleaner #199

Closed azizka closed 6 years ago

azizka commented 6 years ago

Hi,

I'd like to submit the CoordinateCleaner package. It's a package for automated cleaning of biological and paleontological collection data, useful for conservation, ecology and evolutionary biology. There is overlap with scrubr, but CoordinateCleaner has a lot of additional functions. So hopefully it could still be appropriate?

In particular the package adds:

CoordianteCleaner is on CRAN and I had discussed it briefly with @sckott when he was in Stockholm in October 2016. There is also a manuscript ready for submission to Methods in Ecology and Evolution linked with the package.

Thanks, Alex

Summary

karthik commented 6 years ago

Thanks @azizka. We will discuss this and get back to you.

sckott commented 6 years ago

hi @azizka - nice to see you here.

As I probably mentioned to you in Stockholm, we do have a overlap policy where we try not to have packages that overlap too much. But here I think we might be okay. What do you think are the areas of overlap for the two packages? Maybe we can just avoid overlapping functionality.

azizka commented 6 years ago

Hi @sckott & @karthik , thanks!

I dug into both packages, after all the overlap seems rather small. See below for a by-function comparison table with scrubr from CRAN (please correct it if I missed something).

In two sentence: The aim of both packages is identical -- to improve quality of occurrence records from large databases, beyond that there is actually little overlap. Few basic functionalities are virtually identical, beyond that scrubr includes date- and taxonomic cleaning, while CoordinateCleaner includes many unique feature for coordinates and fossils, and enables custom gazetteers and custom precision for the match with political centroids and capitals.

Sorry for the slow reply, I was out of office.

Functionality CoordinateCleaner1.0-7 scrubr 0.1.1 Percent overlap
Missing coordinates cc_val coord_incomplete 100%
Coordinates outside CRS cc_val coord_impossible 100%
Duplicated records cc_dupl dedup The aim is identical, methods differ
0/0 coordinates cc_zero coord_unlikely 100%
Identical lon/lat cc_equ - 0%
Country capitals cc_cap - 0%
Political unit centroids cc_cen "not ready yet" 0%
Coordinates in-congruent with additional location information cc_count coord_within 100%
Coordinates assigned to GBIF headquaters cc_gbif - 0%
Coordinates assigned to the location of biodiversity institutions cc_inst - 0%
Spatial outliers cc_outl - 0%
Coordinates within the ocean cc_sea - 0%
Coordinates in urban area cc_urb - 0%
Coordinate conversion error dc_ddmm - 0%
Rounded coordinates/rasterized collection dc_round - 0%
Fossils: invalid age range tc_equal - 0%
Fossils: excessive age range tc_range - 0%
Fossils: temporal outlier tc_outl - 0%
Fossils: PyRate interface WritePyrate - 0%
Wrapper functions to run all test CleanCoordinates, CleanCoordinatesDS, CleanCoordiantesFOS - 0%
Database of biodiversity institutions institutions - 0%
Taxonomic cleaning - tax_no_epithet 0%
Missing date - date_missing 0%
Add date - date_create 0%
Date format - date_standardize 0%
sckott commented 6 years ago

thanks for this @azizka - Sorry about delay in responding. We agree that the overlap isn't sufficient to warrant concern. We'd like you to submit the package for review. Open a new issue and fill out the issue template you'll see.

sckott commented 6 years ago

closing this, looking forward to your submission