Presubmission Inquiry: jlmerclusterperm

yjunechoe commented 1 year ago

Submitting Author Name: June Choe Submitting Author Github Handle: !--author1-->@yjunechoe<!--end-author1-- Repository: https://github.com/yjunechoe/jlmerclusterperm Submission type: Pre-submission Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: jlmerclusterperm
Title: Cluster-Based Permutation Analysis for Densely Sampled Time Data
Version: 1.0.0
Authors@R: 
    person("June", "Choe", , "jchoe001@gmail.com", role = c("aut", "cre", "cph"),
           comment = c(ORCID = "0000-0002-0701-921X"))
Description: An implementation of fast cluster-based permutation analysis
    (CPA) for densely-sampled time data developed in Maris & Oostenveld,
    2007 <doi:10.1016/j.jneumeth.2007.03.024>. Supports (generalized,
    mixed-effects) regression models for the calculation of timewise
    statistics. Provides both a wholesale and a piecemeal interface to the
    CPA procedure with an emphasis on interpretability and diagnostics.
    Integrates 'Julia' libraries 'MixedModels.JL' and 'GLM.JL' for
    performance improvements, with additional functionalities for
    interfacing with 'Julia' from 'R' powered by the 'JuliaConnectoR'
    package.
License: MIT + file LICENSE
URL: https://github.com/yjunechoe/jlmerclusterperm,
    https://yjunechoe.github.io/jlmerclusterperm/
BugReports: https://github.com/yjunechoe/jlmerclusterperm/issues
Depends:
    R (>= 3.5)
Imports: 
    cli,
    generics,
    JuliaConnectoR,
    lme4,
    parallel,
    stats,
    utils
Suggests:
    broom,
    broom.mixed,
    covr,
    dplyr,
    eyetrackingR,
    forcats,
    future,
    ggplot2,
    knitr,
    MASS,
    patchwork,
    readr,
    rmarkdown,
    scales,
    testthat (>= 3.0.0),
    tibble
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd",
    "srr::srr_stats_roclet"))
RoxygenNote: 7.2.3
SystemRequirements: Julia (>= 1.8)
Collate:
    'jlmerclusterperm-package.R'
    'aaa.R'
    'utils.R'
    'interop-utils.R'
    'interop-utils-unexported.R'
    'julia_rng.R'
    'jlmer_spec.R'
    'jlmer.R'
    'compute_timewise_statistics.R'
    'permute.R'
    'permute_timewise_statistics.R'
    'clusters_methods.R'
    'extract_clusters.R'
    'calculate_pvalue.R'
    'clusterpermute.R'
    'threshold_search.R'
    'tidy.R'
    'zzz.R'
    'srr-stats-standards.R'

Scope

Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

Data Lifecycle Packages
- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
  
  Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [ ] Machine Learning
- [x] Regression and Supervised Learning
- [ ] Exploratory Data Analysis (EDA) and Summary Statistics
- [ ] Spatial Analyses
- [ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

jlmerclusterperm implements a regression-based CPA by fitting a regression model to each time point in a timeseries, and using the array of t-statistics from the regression output as a measure of "distance" between groups/lines in a time series. However, the package itself does not implement any novel regression algorithms - it simply calls GLM.jl (R equivalent lm/glm) and MixedModels.jl (R equivalent lme4::lmer/glmer) model fitting functions via the Julia backend.

If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?

Yes - I have started with a first draft. Tests specific to the correctness of the regression implementation itself are generally not applicable, to the extent that they are already tested internally in Julia by GLM and MixedModels themselves. However, correctness tests are included where the application of regression has consequences for the CPA procedure specifically.

Who is the target audience and what are scientific applications of this package?

Researchers in psychology, neuroscience, and linguistics conducting experiments on human subjects, working with densely-sampled time data (e.g., eye-tracking and EEG) with multi-level grouping structures (multiple by-participant and by-item observations). CPA was originally designed for application in EEG and have since been adopted for other methodologies such as eye-tracking and mouse-tracking.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Yes - eyetrackingR, clusterperm, permuco, and permutes. eyetrackingR and clusterperm are inactive and no longer maintained as of late. permuco is feature-rich and maintained but does not support mixed models. permutes is a fast implementation of CPA and does support mixed models, but via an approximation algorithm which is not faithful to the original procedure as outlined in Maris & Oostenveld (2007). jlmerclusterperm is unique in supporting a mixed-models-based CPA that is computationally feasible to run and much faster than prior implementations (comparison benchmark is in the works, but it's fast for the same reason that MixedModels.jl is faster than {lme4} - this is known). jlmerclusterperm also exposes the individual steps of the CPA as independent functions, emphasizing the modular nature of the procedure and making it easier to interpret and diagnose a CPA.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Not applicable - package does not store or transmit any data

Any other questions or issues we should be aware of?:

The package integrates Julia via JuliaConnectoR - it both calls existing Julia libraries as well as custom Julia code in inst/julia. The pkgcheck is set up via GHA and tracked in https://github.com/yjunechoe/jlmerclusterperm/issues/1. The package website includes many vignettes/articles that are not included in the package proper

noamross commented 1 year ago

Hello @yjunechoe and thanks for you inquiry! We took some time looking at this because packages wrapping Julia code are a new category for us, and the editorial board wanted to consider whether we could reasonably identify reviewers and provide a good review. We've determined we would accept this for review, but it would be an experimental test case to see how it goes. As with other statistical packages, you would have to still show compliance with the general and regression categories of standards, and at least 50% of those standards have to be applicable.

We do have a question: A couple of editors did want to know why you opted to structure the package in this way, rather than put most of the code in a Julia package? If you organized it that way, your methods would be available in Julia, and then you could have only a thin wrapper around that Julia package expose everything in R. In any case, we think a clear statement of purpose/audience in the package documentation should include why it uses mixed languages in this way.

yjunechoe commented 1 year ago

Thank you @noamross and to all those involved for considering jlmerclusterperm! I understand that this would make a somewhat atypical submission, so I am especially thankful for your evaluation. I suspect that there will be more hybrid packages of this sort in the future, so I'm happy for my package to be a guinea pig for an experimental review process.

To your question: this is largely a historical accident. I envisioned the package to be mostly written in R but simply swap out lm/lme4 for MixedModels. So originally I wrote a small Julia script to wrap one or two MixedModels.jl functions and do a for loop, but then throughout development I got more ambitious and started moving more of the package internals from R to Julia (and this has paid off in speed). In sum, the Julia code isn't itself a package because I simply did not start out doing so (and at the time it didn't make sense to, but at this point maybe it's better to do so?).

I do still have some reservations about whether it's necessary wrap the Julia code into a package, though (and it'd be helpful to get some second opinions). One reason is that most of the Julia functions are internal functions that are tightly integrated with R functions of the same name. As they're written, they're not suitable for a Julia user to pick up and use in Julia - they're very dependent on the particular inputs provided to them by the R functions. This was an intentional design: it allows me to validate the user input early and only once in the R functions and return nicely formatted messages from R. I'm happy with this very limited scope of the Julia code as simply powering my R functions, in part because I believe that a typical Julia user could/would just "do it themself" (my Julia code is just a glorified for loop).

I'm sure that there are still merits to turning the Julia code into a package even for its limited scope of being jlmerclusterperm's backend. In an ideal world I would simply do this, but due to my limited knowledge of Julia package development, I've opted to pursue a middle ground approach where the Julia code is sourced in a specific project environment. One step further would be to wrap the functions into a module, which, among other things, would allow namespacing within Julia. Anyways, these are all just my half baked thoughts - I'm happy to discuss further in the actual submission issue.

Lastly, a question of my own if I may: the package is also undergoing review on CRAN. Would it be okay to open the actual submission issue for jlmerclusterperm after it's published on CRAN? CRAN's scrutiny of this package has been demanding and I would like to get this resolved before doing anything else with this package.

Best, June

mpadge commented 1 year ago

@yjunechoe Regarding documentation of standards compliance, this could also serve as an experimental test case. In particular, you'll likely need to document compliance of your inst/julia code. There is currently no direct way to do that, so it would be good to work with you to devise a satisfactory solution for how that might be done. When you get that far, please ping me via an issue in your package, and we can discuss further how to adapt our system to optimise your documentation procedure. Thanks.

And I'll also answer your last question: We generally advise CRAN submission after review here, but if you've already started with CRAN, then yes, feel free to resolve that first before proceeding with full submission here.

yjunechoe commented 1 year ago

@mpadge Thank you! And before I close the issue, just to clarify: should I tag you on a new issue in my repo regarding the statistical standards prior to creating the submission issue here (or would that be part of the review process itself?).

mpadge commented 1 year ago

Yes @yjunechoe, please tag me in your own repo, not here, and we can continue discussions there. And yes, we'll need to use that to work out a sensible way to document standards as part of you preparing for a full submission.

yjunechoe commented 1 year ago

Great - I will prepare for this and let you know when I am ready (hopefully not too long from now). Thanks again!

ropensci / software-review

Presubmission Inquiry: jlmerclusterperm #591

Scope