Presubmission Inquiry: gp - A Grammar of Plates

Submitting Author Name: Kai Aragaki Submitting Author Github Handle: !--author1-->@KaiAragaki<!--end-author1-- Repository: https://github.com/KaiAragaki/gp Submission type: Pre-submission Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: gp
Title: A Grammar of Plates
Version: 0.1.1
Authors@R: 
    person(given = "Kai",
           family = "Aragaki",
           role = c("aut", "cre"),
           email = "aaragak1@jhmi.edu",
           comment = c(ORCID = "0000-0002-9458-0426"))
Description: `gp` attempts to provide a succinct yet powerful grammar to describe common microwell layouts to aide in both plotting and tidying.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Imports: 
    cli,
    dplyr,
    ggplot2,
    glue,
    methods,
    purrr,
    rlang,
    stats,
    tibble,
    tidyr
Suggests: 
    rmarkdown,
    knitr,
    testthat (>= 3.0.0)
VignetteBuilder: knitr
Depends: 
    R (>= 4.1.0)
URL: https://kaiaragaki.github.io/gp/
Config/testthat/edition: 3

Scope

Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

Data Lifecycle Packages
- [ ] data retrieval
- [ ] data extraction
- [ ] database access
- [x] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] scientific software wrappers
  - [ ] citation management and bibliometrics
- [ ] database software bindings
- [ ] geospatial data
- [ ] text data
  
  Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [ ] Machine Learning
- [ ] Regression and Supervised Learning
- [ ] Exploratory Data Analysis (EDA) and Summary Statistics
- [ ] Spatial Analyses
- [ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

The package is largely involved in converting plate-format data into tidy data (similar but not identical to the excellent plater - see below)

If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?

Who is the target audience and what are scientific applications of this package?

Both bench scientists who generate plate data, developers who ingest plate data, and people who need to illustrate plate layouts for illustration purposes (likely for protocols or apps).

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

The wonderful plater package accomplishes a similar goal, but I don't believe it has the same scope that this package does. The similarity between the two packages is the main reason why I have submitted this inquiry.

plater is a wonderful interface for tidying microwell data by laying out experimental design in a spreadsheet-like manner. gp does this, but by writing code instead of using a spreadsheet, reducing the amount of undocumented steps. It also provides handy tools for plotting plate layouts - indeed, plotting and tidying go a bit hand-in-hand in gp. Two vignettes are on the pkgdown website that demonstrate its flexibility and how it is used to tidy data

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Any other questions or issues we should be aware of?

My primary concern is if this package is too similar to the plater package to bar it from receiving further review. Regardless, thank you very much for your time!

Hi @KaiAragaki - Thank you so much for submitting this inquiry. We especially appreciate your detailed comparison to the existing plater package and the comparison to gp. I was wondering if you could please elaborate on just one point there:

Could you please comment on any interoperability that you considered between gp and plater? Are there ways these packages can work together to make use of any existing plater functionality?

For example (and only for example! I have not used plater so these are hypotheticals and not suggestions), Could your code-based process for reading data create the same type of object as plater if desired? Is it possible to cast between the types of structures gp and plater create if desired? Could the gp visualization functions be extended such that they could also work on plater objects?

I understand your point that the tidying/plotting are tightly coupled in some ways, but would just appreciate your comment on this. Thank you!

Hi @emilyriederer! Thank you for taking the time to review my presubmission.

While there currently isn't any built in interoperability, that would be a fairly simple addition (At least, by my naive prediction).

plater does not create specific objects, but ingests .csv files formatted in a particular way to produce tidy, annotated tibbles. gp does not read in files, but takes data.frame-like objects and tidies and annotates them. I've included a true masterpiece of an ascii illustration below to show you the current conversions that are done (solid-ish lines) and those that could be introduced to increase interoperability (dashed lines)

                plater
annotated .csv --------> annotated tidy tibble <- - - - - - - - - - - - - - - - - - ,
       ^- - - - - - - - - - - - - - - - - - - - - - - - - - - - ,                   :
                                                                :                   :
                   base, readr, readxl...                gp     v     gp            v
.csv/.tsv/.xls... -----------------------> data.frame --------> gp <------> annotated tidy tibble
                                      tidy data.frame ----------^

The tidying/plotting coupling comes from the way that gp objects are 'built up', much like a SQL query is built. Plotting gives you an idea of what each additional layer of annotation you add looks like. Once you've built up all the levels of annotation for the plate, you switch the pipe output from gp_plot to gp_serve. Because of this, by plotting the data you tidy it at the same time. I think this vignette helps explain it in pictures a bit better than I did here (and thankfully they are not my ascii pictures), but that's the gist of it.

Did this answer your questions? Sorry if this confused things further.

Thank you again for your time!

Hi @KaiAragaki - thank you for the additional information! I really appreciate the thoughtful reply. I am pulling in some editors from our team with more experience in this domain to discuss and will follow up soon.

Hi @KaiAragaki - thank you for your patience. After discussing with the editorial board, we have judged your package to be in-scope and would like to invite you to make a full submission. I'll close this presubmission issue, and please go ahead and open a new issue with the full submission.

Some notes of interest from our discussion:

We believe your package also fits in the field and laboratory reproducibility tools category (which we realize is missing from the template)
We also noted some similarity with the recently reviewed tidyqpcr package. While these packages have different strengths and are both in-scope, it could be great to add that to your discussion of different related packages, their benefits, and how they might interact

Thank you and we look forward to seeing your submission!

ropensci / software-review

Presubmission Inquiry: gp - A Grammar of Plates #539

Scope