ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Presubmission inquiry - spiro: Manage Data from Cardiopulmonary Exercise Testing #537

Closed smnnlt closed 2 years ago

smnnlt commented 2 years ago

Submitting Author Name: Simon Nolte Submitting Author Github Handle: !--author1-->@smnnlt<!--end-author1-- Repository: https://github.com/smnnlt/spiro Submission type: Pre-submission Language: en


Package: spiro
Title: Manage Data from Cardiopulmonary Exercise Testing
Version: 0.0.4
Authors@R: 
    person(given = "Simon",
           family = "Nolte",
           role = c("aut", "cre"),
           email = "s.nolte@dshs-koeln.de",
           comment = c(ORCID = "0000-0003-1643-1860"))
Description: Import, process, summarize and visualize raw data from 
    metabolic carts.
License: MIT + file LICENSE
URL: https://github.com/smnnlt/spiro, https://smnnlt.github.io/spiro/
BugReports: https://github.com/smnnlt/spiro/issues
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.0
Imports: 
    ggplot2,
    xml2,
    readxl,
    knitr,
    cowplot,
    digest,
    signal
Suggests: 
    testthat (>= 3.0.0),
    rmarkdown,
    ggborderline
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

The spiro package allows to read and process data from raw data files of different metabolic carts.

N/A

This package is primarily written for researchers in exercise science, who want to make their analysis of cardiopulmonary exercise testing more standardized, reproducible and faster. It may also be used in a commercial context (e.g., training diagnostics business)

The whippr package has a different approach to the same problem. Compared to whippr, spiro has a more automated and simpler data workflow (basically one function for reading and processing data, and one function for summarizing or plotting). spiro has several relevant additional features, that whippr does not have: Automated detection and manual generation of exercise test protocols; data summary by load steps; adding and synchronizing external heart rate data; import of raw data file meta data; advanced data filtering methods (e.g., Butterworth filters; moving breath averages); Wasserman 9-Panel-Plots. Compared to whippr, the spiro package does not offer methods for VO2 kinetics analysis and automated outlier removal.

This package works with cardiopulmonary exercise data, which is per se sensitive health data. Meta data from the original raw data files is read and anonymized by default (with the exception of data on body mass, which is necessary to perform certain calculations of variables). The anonymization can optionally be deactivated by means of a function argument [spiro(anonymize = FALSE)], so that meta data is saved alongside the processed data. This may be helpful in some settings when there is no intent to share the data. Sharing of the resulting data in such situations could potentially reveal personal information, which is why this option is not activated by default.

This is my first R package and I'm not exactly sure in which category it fits best. I also haven't seen any package from the field of sports and exercise science in rOpenSci yet, so I'm curious whether it fits your general scope.

emilyriederer commented 2 years ago

Hi @smnnlt ! Thank you so much for the inquiry and for your detailed write-up. I really like the slick visualizations you show on the website.

As we consider whether this package is in-scope, I am hoping to get a bit more information since we don't have very much domain expertise in this field.

It might be helpful to add a bit more of this context to the README. After reading it someone with little domain knowledge should ideally understand the aim, goals and functionality of the package.

Thanks again!

smnnlt commented 2 years ago

Hi @emilyriederer !

Thank you for your comments! I have updated the README with a 'Background' section. I hope it is now more clear to people not from exercise science what the package aims for.

I see your package supports data collected from a number of different devices (e.g. CORTEX, ZAN, etc.) Can you provide any guess for how much of research is covered by the covered formats?

The devices supported should cover about 40-50% of all research in cardiopulmonary exercise testing. I will add support for further devices as soon as I get enough raw data files. I have added this information to the README.

Not having seen the raw data formats, could you please help us understand better what types of processing/wrangling the package supports? For example, is the data previously unstructured and you're making it structured/tabular?

Depending on the measuring device, the raw data can come in different formats (e.g., xlsx, xml, txt). Usually the raw data is structured, but quiet messy (meta data not seperated from the raw measurements, different naming of variables, ...). You may see some example raw data files in the inst/extdata folder of the package. The package imports the raw data in the first step to get an R data.frame with consistent column naming. Once imported, the package applies further processing, e.g. the interpolation of data to full seconds, calculation of more variables and synchronizing with additional data.

I hope my answers help to clarify the purpose of the package.

emilyriederer commented 2 years ago

Dear @smnnlt

Thank you for your answers. Taking all of that into account, I believe that this package to be in-scope. I will now close this pre-submission inquiry issue and we look forward to your full submission.

Thanks again!

emilyriederer commented 2 years ago

Hi @smnnlt

I realized in my last message I accidentally tagged a user spiro (the package name) instead of your correct handle. Pinging again to make sure you saw that I closed the pre-submission issue as a matter of process, but that this is not a rejection! We welcome you to proceed to submit the package for review.

Emily

smnnlt commented 2 years ago

Hi @emilyriederer

Thank you for your message. I am currently working on some final adjustments and look forward to submitting the package in the upcoming 1-2 weeks.

Simon