ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

Presubmission inquiry: datefixR: Fix Really Messy Dates #529

Closed nathansam closed 2 years ago

nathansam commented 2 years ago

Submitting Author Name: Nathan Constantine-Cooke Submitting Author Github Handle: !--author1-->@nathansam<!--end-author1-- Repository: https://github.com/nathansam/datefixR Submission type: Pre-submission Language: en


Package: datefixR
Title: Fix Really Messy Dates
Version: 0.1.4.9000
Authors@R: person("Nathan",
                  "Constantine-Cooke",
                  email = "nathan.constantine-cooke@ed.ac.uk",
                  role = c("aut", "cre"),
                  comment = c(ORCID = "0000-0002-4437-8713"))
Description: Fixes messy dates in data frames such as those entered via text
  boxes. Standardizes / - and whitespace separation, month abbreviations, and
  year first or day first by converting to R's built-in Date class. Imputes
  missing date or month using user-provided values.  
License: GPL (>= 3)
Depends: R (>= 4.0.0)
Imports: stringr
Language: en-US
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
URL: https://www.constantine-cooke.com/datefixR/ https://github.com/nathansam/datefixR
BugReports: https://github.com/nathansam/datefixR/issues
Suggests: 
    rmarkdown,
    knitr,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

datefixR takes date data in which has been stored in many different formats (01/01/2001, 5 April 2020, Dec 2015 etc.) and converts them to R's Date type.

NA

Any researchers using data entered via a questionnaire which (unfortunately) asked for a date as free-text. Given the nature of this data generation, this mainly affects those who work with human subjects.

lubridate::guess_formats() can be used to guess a date format and lubridate::parse_date_time() calls this function when it attempts to parse a vector into a POSIXct date-time object. However:

  1. When a date fails to parse in {lubridate} then the user is simply told how many dates failed to parse. In {datefixR} the user is told the ID (assumed to be the first column by default but can be user-specified) corresponding to the date which failed to parse and reports the considered date: making it much easier to figure out which dates supplied failed to parse and why.
  2. When imputing a missing day or month, there is no user-control over this behaviour. For example, when imputing a missing month, the user may wish to impute July, the middle of the year, instead of January. However, January will always be imputed in {lubridate}. In {datefixR}, this behaviour can be controlled by the month.impute argument.
  3. These functions require all possible date formats to be specified in the orders argument, which may result in a date format not being considered if the user forgets to list one of the possible formats. By contrast, {datefixR} only needs a format to be specified if month-first is to be preferred over day-first when guessing a date.

linelist::guess_dates() appears to have performed a somewhat similar role. However, this function did not leave the experimental lifecycle phase and the package itself is no longer available on CRAN.

The package is on CRAN with no reverse dependencies.

jooolia commented 2 years ago

Dear @nathansam, Thank you for your pre-submission. Thanks for mentioning linelist::guess_dates(), could you also expand this section and discuss lubridate::guess_formats()? Thanks, Julia

nathansam commented 2 years ago

Thanks @jooolia, Sorry, I should have mentioned that before! I have now added a section on lubridate's date guessing functions above.

jooolia commented 2 years ago

Dear @nathansam, We have determined that this package is in-scope for rOpenSci and we will welcome your full submission. Please add information about the other packages with similar (but not the same) functionality in your Readme. Another editor also mentioned {anytime} as another package worth describing in comparison to yours. Thanks, Julia

nathansam commented 2 years ago

That is great news, thank you! My thanks to you and the rest of the team. I will of course do a full submission ASAP.