ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Presubmission inquiry: daiquiri: Data quality reporting for temporal datasets #527

Closed phuongquan closed 2 years ago

phuongquan commented 2 years ago

Submitting Author Name: T. Phuong Quan Submitting Author Github Handle: !--author1-->@phuongquan<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/phuongquan/daiquiri Submission type: Pre-submission Language: en


Package: daiquiri
Type: Package
Title: Data quality reporting for temporal datasets
Version: 0.7.0
Authors@R: c(
    person("T. Phuong", "Quan", email = "phuong.quan@ndm.ox.ac.uk",
        role = c("aut", "cre"), comment = c(ORCID = "0000-0001-8566-1817")),
    person("Jack", "Cregan", role = "ctb"),
    person(family = "University of Oxford", role = "cph"),
    person(family = "National Institute for Health Research (NIHR)", role = "fnd")
    )
Description: Generate reports that enable quick visual review of 
    temporal shifts in record-level data. Time series plots showing aggregated 
    values are automatically created for each data field (column) depending on its 
    contents (e.g. min/max/mean values for numeric data, no. of distinct 
    values for categorical data), as well as overviews for missing values, 
    non-conformant values, and duplicated rows. The resulting reports are sharable 
    and can contribute to forming a transparent record of the entire analysis process. 
    It is designed with Electronic Health Records in mind, but can be used for 
    any type of record-level temporal data (i.e. tabular data where each row represents 
    a single “event”, one column contains the "event date", and other columns 
    contain any associated values for the event).
URL: https://github.com/phuongquan/daiquiri
BugReports: https://github.com/phuongquan/daiquiri/issues
License: GPL (>=3)
Encoding: UTF-8
Imports:
    data.table (>= 1.12.8),
    readr (>= 1.3.1),
    ggplot2 (>= 3.1.0),
    scales (>= 1.1.0),
    cowplot (>= 0.9.3),
    rmarkdown,
    reactable (>= 0.2.3)
RoxygenNote: 7.1.2
Suggests:
    covr,
    knitr,
    testthat
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

It takes a generic data frame containing raw, record-level, temporal data, and generates a data quality report that enables quick visual review of any unexpected temporal shifts in measures such as missingness, min/max/mean/distinct values, and non-conformance. There is a category of 'data validation and testing' on the https://devguide.ropensci.org/policies.html#package-categories page, which I think is more relevant, but it doesn't appear in the list above.

No

The target audience is all researchers who analyse data from large, temporal datasets, particularly routinely-collected data such as electronic health records. The package helps them to quickly check for temporal biases in their data before embarking on their main analyses. It also helps them to do this in a thorough, consistent and transparent way (since the reports are shareable), hence increasing the quality of their studies as well as trust in the scientific process.

To my knowledge, there are a small number of R packages that generate summary statistics and/or data quality reports, (with the two most similar being dataquieR and DQAstats), but none which assist in identifying temporal changes in the data, nor which are as lightweight to use and consume.

Yes

jooolia commented 2 years ago

Dear @phuongquan, Thank you for your submission. I am discussing with the other editors regarding your question about "data validation and testing" category.

jooolia commented 2 years ago

Hi @phuongquan , We have determined that your package would be a good fit for the data validation and testing category. This category was missing on the template so thank you for pointing this out. We welcome a full submission and before a full submission I recommend checking your package with {pkgcheck}(https://docs.ropensci.org/pkgcheck/). Thanks, Julia