Closed simpar1471 closed 7 months ago
Thank you for this pre-submission, @simpar1471! This is an interesting package. I note it somewhat straddles the line between the standard category of "data munging," which has traditionally include software that implements field-specific standard conversions and deterministic descriptive calculations on data, and the the still less-defined EDA statistical category, which we had envisioned as tools to enable things like exploring the distributions and correlations within data. I'm going to pose this to some of the other editors and follow up, as this will change the way you submit, and whether you need to use srr
, standards.
Hi @noamross, just wondering if you managed to get an answer from the other editors?
Thank you for following up @simpar1471! My apologies, I did get some feedback and didn't post it back here. For edge cases like these, we ask that things be submitted as statistical packages if 50% or more of the standards can be applied. I think that this applies to the EDA standards and General standards in this case, so we would be happy to accept full a submission of the package under the EDA statistical category.
Ah, thank you @noamross! I'll open a new issue in due course once the package is updated.
Submitting Author Name: Simon Parker Submitting Author Github Handle: !--author1-->@simpar1471<!--end-author1-- Repository: https://www.github.com/lshtm-gigs/gigs/ Submission type: Pre-submission Language: en
Scope
Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):
Data Lifecycle Packages
[ ] data retrieval
[ ] data extraction
[ ] data munging
[ ] data deposition
[ ] data validation and testing
[ ] workflow automation
[ ] version control
[ ] citation management and bibliometrics
[ ] scientific software wrappers
[ ] field and lab reproducibility tools
[ ] database software bindings
[ ] geospatial data
[ ] text analysis
Statistical Packages
[ ] Bayesian and Monte Carlo Routines
[ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
[ ] Machine Learning
[ ] Regression and Supervised Learning
[x] Exploratory Data Analysis (EDA) and Summary Statistics
[ ] Spatial Analyses
[ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
The package is used to convert measurements, for example a baby's weight at a given age, into summary statistics such as z-scores (i.e. number of standard deviations away from mean) and percentiles (i.e. number of percentage points above/below the median, expressed as a decimal). It can also be used to classify these z-scores and percentiles into specific categories for specific facets of growth.
We have started this process, and will have it done by the next version of
gigs
released onto GitHub.The target audience for this package is researchers, clinicians, and policy-makers interested in nutrition and newborn and child health. Scientific applications of this package would include using our functions to generate statistical measures of growth in individuals or populations across time or pre/post a healthcare intervention, and then to perform further downstream analysis on these measures.
Several R packages which implement child growth charts already exist, but they differ in the range of growth charts offered, their flexibility in conversion, and in what data they make available to users.
anthro
converts measurements into z-scores in the WHO Child Growth Standards, but lacks any INTERGROWTH-21st standards and outputs tabular data. We provide an interface which takes vectors in and gives vectors out, so is more flexible (e.g. withdplyr
pipelines). This package is available on CRAN.childsds
can convert measurements to z-scores or percentiles, but cannot convert z-scores/percentiles to expected measurements. Additionally,childsds
does not contain the newborn/postnatal INTERGROWTH-21st growth standards, which we implement. Thoughchildsds
does include more growth references, growth references are not within the scope of the GIGS project. We can discuss the reasons for this if necessary. This package is available on CRAN.growthstandards
contains functions for converting between values and z-scores/percentiles, as ingigs
. It includes the INTERGROWTH-21st fetal standards, but not the newborn or postnatal standards we implement. This package makes coefficients available to end-users, but not reference growth curves. This package is not available on CRAN, and was last updated in 2021.intergrowth
provides more fetal growth standards thangigs
, but cannot convert between z-scores/percentiles in the lacks the INTERGROWTH-21st newborn or postnatal growth standards, which we have implemented ingigs
. This package also does not make coefficients for the growth standards available to end-users, though it it does provide growth curve data to end-users. This package is not available on CRAN, and was last updated in January 2023.gigs
provides a simple interface for working with the growth standards it implements, which can be easily included indplyr
-like data wrangling pipelines. Reference growth curves and model coefficients are available wherever possible, and we intend to update the package with extra functionality (e.g. the full suite of fetal/maternal INTERGROWTH-21st standards) by June of next year. We are ready to submit to CRAN, but are waiting to see how you respond to this presubmission inquiry.In terms of performance/scaling from 1 to 10000 inputs,
anthro
performs/scales the worst (28 ms to 234 ms), thengigs
(921.8 microseconds to 41 ms), thengrowthstandards
(4.7497 ms to 23 ms), thenchildsds
(2 ms to 17 ms). We will look to these implementations to see how we can makegigs
faster, if required.Not applicable.
None.