Closed phuongquan closed 2 years ago
Dear @phuongquan, Thank you for your submission. I am discussing with the other editors regarding your question about "data validation and testing" category.
Hi @phuongquan , We have determined that your package would be a good fit for the data validation and testing category. This category was missing on the template so thank you for pointing this out. We welcome a full submission and before a full submission I recommend checking your package with {pkgcheck}(https://docs.ropensci.org/pkgcheck/). Thanks, Julia
Submitting Author Name: T. Phuong Quan Submitting Author Github Handle: !--author1-->@phuongquan<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/phuongquan/daiquiri Submission type: Pre-submission Language: en
Scope
Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):
Data Lifecycle Packages
[ ] data retrieval
[ ] data extraction
[ ] database access
[ ] data munging
[ ] data deposition
[ ] workflow automation
[ ] version control
[ ] citation management and bibliometrics
[ ] scientific software wrappers
[ ] database software bindings
[ ] geospatial data
[ ] text data
[x] data validation and testing
Statistical Packages
[ ] Bayesian and Monte Carlo Routines
[ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
[ ] Machine Learning
[ ] Regression and Supervised Learning
[ ] Exploratory Data Analysis (EDA) and Summary Statistics
[ ] Spatial Analyses
[ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
It takes a generic data frame containing raw, record-level, temporal data, and generates a data quality report that enables quick visual review of any unexpected temporal shifts in measures such as missingness, min/max/mean/distinct values, and non-conformance. There is a category of 'data validation and testing' on the https://devguide.ropensci.org/policies.html#package-categories page, which I think is more relevant, but it doesn't appear in the list above.
No
The target audience is all researchers who analyse data from large, temporal datasets, particularly routinely-collected data such as electronic health records. The package helps them to quickly check for temporal biases in their data before embarking on their main analyses. It also helps them to do this in a thorough, consistent and transparent way (since the reports are shareable), hence increasing the quality of their studies as well as trust in the scientific process.
To my knowledge, there are a small number of R packages that generate summary statistics and/or data quality reports, (with the two most similar being dataquieR and DQAstats), but none which assist in identifying temporal changes in the data, nor which are as lightweight to use and consume.
Yes