openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
707 stars 37 forks source link

[REVIEW]: MousebreedeR: A novel software to assist in the design of breeding schema for complex genotypes of experimental organisms #6474

Closed editorialbot closed 3 months ago

editorialbot commented 6 months ago

Submitting author: !--author-handle-->@sportiellomike<!--end-author-handle-- (Mike Sportiello) Repository: https://github.com/sportiellomike/mousebreedeR Branch with paper.md (empty if default branch): Version: v1.0.0 Editor: !--editor-->@csoneson<!--end-editor-- Reviewers: @yjunechoe, @jsun Archive: 10.5281/zenodo.11405147

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/dcbcc4d1aac93c71fea9292eb0ef8ed5"><img src="https://joss.theoj.org/papers/dcbcc4d1aac93c71fea9292eb0ef8ed5/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/dcbcc4d1aac93c71fea9292eb0ef8ed5/status.svg)](https://joss.theoj.org/papers/dcbcc4d1aac93c71fea9292eb0ef8ed5)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@yjunechoe & @jsun, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @yjunechoe

πŸ“ Checklist for @jsun

editorialbot commented 6 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 6 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1210/endocr/bqaa114 is OK
- 10.1371/journal.pone.0012418 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 6 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.01 s (1908.7 files/s, 85596.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               20             65            400            374
Rmd                              1             30             79             89
Markdown                         3             26              0             52
TeX                              1              1              0             27
YAML                             1              1              4             18
-------------------------------------------------------------------------------
SUM:                            26            123            483            560
-------------------------------------------------------------------------------

Commit count by author:

    40  sportiellomike
    17  Mike Sportiello
editorialbot commented 6 months ago

Paper file info:

πŸ“„ Wordcount for paper.md is 738

βœ… The paper includes a Statement of need section

editorialbot commented 6 months ago

License info:

βœ… License found: MIT License (Valid open source OSI approved license)

editorialbot commented 6 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

csoneson commented 6 months ago

πŸ‘‹πŸΌ @sportiellomike, @yjunechoe, @jsun - this is the review thread for the submission. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread. These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues directly in the software repository. If you do so, please mention this thread so that a link is created (and I can keep an eye on what is happening). Please also feel free to comment and ask questions in this thread. It is often easier to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@csoneson) if you have any questions or concerns. Thanks!

yjunechoe commented 6 months ago

Review checklist for @yjunechoe

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

csoneson commented 5 months ago

πŸ‘‹πŸ» Just wanted to check in to see that things were going well here and that there are no questions about the process (@jsun - I see that you haven't yet created your checklist, let me know if you have any questions!) Thanks all!

yjunechoe commented 5 months ago

Hi @csoneson - I'm returning to this review after my first pass at the checklist (I have been awaiting a response from the author about automated tests before I proceed further)

I took some time to read the contents of this package more closely, and I have some serious concerns about the Substantial scholarly effort criterion.

Inside the package I count 12 functions - a combined LOC of 273 lines and an average of 23 lines for each function (LOC range: 4-45). Looking at internal function calls as a proxy for complexity, each function in the package is composed of an average 13 calls to other functions, mostly c(), return(), and print(). See below for the code involved in this analysis.

I'm afraid there's not much for me to work with here to evaluate the package. As per the review guidelines, I would like to flag the submission as questionable scope before proceeding further.


Analysis of substantial scholarly effort ``` r ## Setup functions <- c( "addmouseID", "addpoints", "canwegetalltheallelesfromonecross", "compilegametes", "engageinmeiosis", "fertilize", "pointsperpup", "possiblepointsfromdesiredoutcome", "spermandeggs", "summarizefertilization", "summarizepotentialpups", "whichpairsshouldibreed" ) length(functions) #> [1] 12 endpoint <- "https://raw.githubusercontent.com/sportiellomike/mousebreedeR/main/R/" files <- file.path(endpoint, paste0(functions, ".R")) lines <- lapply(files, readLines) ## Code lines count_code_lines <- function(x) { code <- grep("^(#|\\s*$)", x, invert = TRUE, value = TRUE) length(code) } all_code_lines <- sapply(lines, count_code_lines) sum(all_code_lines) #> [1] 273 mean(all_code_lines) #> [1] 22.75 range(all_code_lines) #> [1] 4 45 ## Function complexity parsed <- lapply(lines, \(x) parse(text = paste(x, collapse = "\n"))) parse_data <- do.call(rbind, lapply(parsed, getParseData)) function_calls <- parse_data |> subset(token == "SYMBOL_FUNCTION_CALL" & !text %in% functions) |> el("text") |> table() head(sort(function_calls, decreasing = TRUE)) #> #> which c return print subset length #> 18 13 12 9 8 7 sum(function_calls) #> [1] 152 sum(function_calls)/length(functions) #> [1] 12.66667 ```
sportiellomike commented 5 months ago
yjunechoe commented 5 months ago

Hi @sportiellomike - thanks for the quick response, and I apologize for not being clear on the issue I opened on automated tests

To your points - I understand your perspective on the usefulness of the tool (albeit I've not fully evaluated this formally). I only mean to comment on the structure and contents of the package as an open source software developed for general consumption, according to the scope and standards of JOSS specifically. To clarify, I'm following the practice of previous reviewers who have raised issues of LOC and complexity as part of evaluating the substantial scholarly effort criterion (see similar issues).

I'm happy to complete the review and help with the review process up to a successful publication, but substantial scholarly effort is a rather tricky matter to resolve, especially if the package is already considered to be at a feature-complete state that addresses its goals. I only seek clarification from the editorial team - I apologize again for the interruption.

csoneson commented 5 months ago

@yjunechoe Thanks for voicing your concern. Let me bring this up with the broader editor team and get back to you.

csoneson commented 5 months ago

@editorialbot query scope

editorialbot commented 5 months ago

Submission flagged for editorial review.

jsun commented 5 months ago

Sorry for the delay. Ill finish my review for next week.

jsun commented 5 months ago

Review checklist for @jsun

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

sportiellomike commented 5 months ago

@csoneson: the reviewer @yjunechoe encouraged me to comment here regarding the "substantial scholarly effort" guideline in the other thread.

While I understand it to be the case JOSS wants longer submissions, I wanted to be clear that this project was created for the purpose of solving specific problems: the unnecessary death of experimental animals by requiring fewer crosses and therefore animals used, as well as increasing the speed of which experimentalists can obtain their desired animal of correct genotype. As a result, i believe this submission to be a substantial contribution as it has achieved those goals. It is true it is a smaller package, but it still is a large contribution to the field of experimental animal morbidity/mortality reduction, of which there is minimal resources, and no software resources that are free and open source to my knowledge. I am glad I was able to achieve these goals in a moderate number of lines of code instead of a large number as it is more efficient, though I freely admit this package is smaller than most published in JOSS. I hope you agree.

jsun commented 5 months ago

This package contains simple but important features to assist wet researchers. Therefore, I think, the main target users of this package would be wet researchers who are unfamiliar with coding programmes. Therefore, I recommend the author to create easy-to-use functions and clear documentation (vignette). The following comments are based on a check of this contribution.

csoneson commented 5 months ago

@csoneson: the reviewer @yjunechoe encouraged me to comment here regarding the "substantial scholarly effort" guideline in the other thread.

Thanks @sportiellomike - the editorial team is discussing the submission and we should have an answer for you soon! Thanks for your patience.

csoneson commented 5 months ago

Thank you for your patience, and thanks again @yjunechoe for raising the point about the scope. The editorial team has carefully reviewed the submission and the discussion, and we felt that although it is close to borderline, we would consider it in scope, and it seems to provide useful functionality for the target audience. Thus, I would encourage the reviewers to continue their evaluation of the submission. Thanks!

yjunechoe commented 5 months ago

@csoneson Thanks for the clarification!

@sportiellomike Apologies again for the interruption. I'll complete the rest of my review swiftly.

yjunechoe commented 5 months ago

I expect some time with back-and-forths on the package repository to get the submission into a good shape, and I'll keep raising specific points as GH issues, but in the meantime here are my general thoughts on the package:

First off I agree with all points that the other reviewer has mentioned above. I'd especially like to echo the point about function/variable names being difficult to parse, the general inconsistencies in coding style (use of spaces and indentation, use of semicolon, etc.), and the fact that the return value of some functions are not R-like. My one point of digression from the other reviewer is that for a package of this size, I don't think a S4 class is necessary, but I do agree that some kind of a custom print method is appropriate (perhaps the lighter-weight S3).

And two big-picture comments/questions of my own:

1) Non-idiomatic R code

There are several cases of non-idiomatic R code in the package. The other reviewer has already given an example of one: the canwegetalltheallelesfromonecross() function returns a string that's either "notonecrossforonecopy" or "onecrossforonecopy", which is very difficult to program over. It'd be more appropriate to have the function be called something like is_onecross() that returns TRUE or FALSE.

A few other examples to this larger point about non-idiomatic R code:

Inside fertilize(), I see:

  fertilizeoutput<<-listofzygotes

With no local binding for the variable fertilizeoutput. This inevitably assigns into the user environment - such leaky behaviors (and side-effect functions in general) are not R-like and highly discouraged for a function in a package. This also happens in spermandeggs()

Also inside fertilize() and elsewhere, I see:

which(spermcolnumerics == T)

This should simply be which(spermcolnumerics) - equality checks to a scalar boolean are redundant and can introduce silent type coercion bugs, since == is loose equality and R doesn't have vectorized strict equality. This comment also applies to use of isTRUE() in a few if statements elsewhere.

Additionally, in addpoints() and elsewhere, I see:

  pointsvector<-c()
  for (k in 1:length(desiredvector)) {

But growing a vector is costly in R. The appropriate pattern is to either pre-allocate the vector and fill it with values, or *apply() over the iterator.

Lastly to keep this short, in pointsperpup() and elsewhere, I see:

  genecolx[genecolx=='homopos']<-2
  genecolx[genecolx=='het']<-1
  genecolx[genecolx=='homoneg']<-0

Such incremental replacement of values are odd when the type doesn't match. The idiomatic R approach would be vectorized subsetting of a named vector, like:

genotypes <- setNames(0:2, c("homoneg", "het", "homopos"))
genotypes[genecolx]

I want to note that the above code chunk of applying number coding to genotypes has been copy-pasted across several functions in the package - this is undesirable for maintenance. To this point, I also notice that the package has zero internal utility functions (all functions are exported), which is uncharacteristic of a package.

These are perhaps minor points and there's quite some way to go on this front, but I would like to see these issues resolved for a publication-quality software

2) Who is the target audience for this tool?

I'm a little puzzled at the level of R experience that is expected of the average user of this tool. Currently, I sense bit of a tension in what is assumed of the audience: the audience is assumed to be familiar enough with R and coding to be able to build/parse the vignettes and understand the package from reading and running the examples themselves, but not experienced enough such that the package exports code like this that simply multiplies the length of a vector by 100, obscured under a function.

possiblepointsfromdesiredoutcome<-function(desiredvector = desiredvec){
  possiblepoints<-length(desiredvector)*100
  return(possiblepoints)
}

Overall, there is not enough hand-holding for a novice R user, but enough non-idiomatic R quirks to confuse a more experienced R user. It would be beneficial for design of the package (and also for the paper) to clearly state the level of R knowledge assumed/required of the user of this package, and cater specifically to that audience (and broaden the audience once the package meets the need of its core users).

I get the sense that it's the former (package is for novice programmers), and I actually wonder in that case if something more like a point-and-click interface is more appropriate (like a shiny app complement to the software). I encourage the author to at least look into this somewhat more obvious method of supporting novice users and comment on the choice to stick to coding in R as the primary interface to this tool.

sportiellomike commented 4 months ago

I am working on these issues and will tag you when resolved. As you mentioned, this will take some time. I appreciate your patience. I'd like to thank the reviewers for their thorough comments, which I know will make this software better and more usable.

Thank you again.

csoneson commented 4 months ago

Hi @sportiellomike - could you give us a short update on how things are progressing? Just want to make sure that submissions are actively worked on and not stalled. Let us know if you have questions about any of the comments. Thanks!

sportiellomike commented 4 months ago

Thanks for checking in! Yes I am working and have plans to give you an update by one week from today having finished the edits (or at least a progress update with a timeline).

I appreciate the editors and reviewers in this process. Thank you.

sportiellomike commented 3 months ago

Thank you to all involved for all your help. I'll get to it.

Thank you again for both reviewers and the editing team for your knowledge, advice, and assistance through this process.

sportiellomike commented 3 months ago

And just to be explicit as I did not mention this above @jsun @yjunechoe: I am making a shiny app as you requested as well. Please let me know where to store this (within this same github?).

yjunechoe commented 3 months ago

@sportiellomike Thank you for the updates, here and elsewhere, and the hard work that went into them. I'm satisfied with your edits and/or justifications to the points I raised in my previous comment. Should they raise a separate problem down the line I will raise them again, but we can move forward on the other points (the ones I've opened as issues on the package repo).

I'm also happy to hear the addition of a shiny app - this will make the submission much stronger. The app should live inside /inst/<APP NAME>/, and a function should be added/exported to allow users to run the app through a function call. Please feel free to ping me on your repo in a new issue for the shiny app, once you have a minimally working app and have put it under /inst

sportiellomike commented 3 months ago

@yjunechoe shiny app is live. if you choose to try it, I put another example datasets up there to test it out that are less complex than original (edit_this_CSV_with_breeder_mice2.csv).

I included "visualization" of uploaded csv for user to check to make sure it looks right, "visualization" of desired genotype that the user can manually type in to make sure it looks correct, the advice the program gives to the user, a couple relevant tables for potential pups/summary per gene and cross, and a download button for the user to download potential pup table.

Let me know if I misunderstood your instructions on where to put it.

jsun commented 3 months ago

@sportiellomike thank you for considering my comments. Here is my other suggestion (just suggestions), you don't need to change your package as in this suggestion.

In the case of the can_we_get_all_the_alleles_from_one_cross function, the function name is "can we get ... ", like a question , so it is just Yes or No as an answer. Therefore, I think TRUE or FALSE would be appropriate as the return value of this function. Or, change the function name to get_all_alleles. This function can return the number of crossing experiments needed to get all alleles as an integer.

For which_pairs_should_i_breed function, the return may organized as a list or slots, for example:

which_pairs_should_i_breed <- function() {

...

  output <- list(recommend_1 = NULL, recommend_2 = NULL, prob=NA, all_alleles_with_one_corss=NA)

  output$recommend_1 <- data.frame(
    mom = c('mouse1', 'mouse1', 'mouse1', 'mouse1', 'mouse2', ...),
    dad = c('mouse5', 'mouse6', 'mouse7', 'mouse8', 'mouse5', ...)
  )
  output$recommend_2 <- data.frame(
    mom = c('mouse1', 'mouse1', 'mouse1', 'mouse1', 'mouse2', ...),
    dad = c('mouse5', 'mouse6', 'mouse7', 'mouse8', 'mouse5', ...)
  )
  output$prob <- 0.25
  output$all_alleles_with_one_corss <- FALSE

  output
}

For some outputs, I can only see momdad column in the data.frame. If it possible, use mom and dad columns instead of using momdad column (of course, you can still remain the momdad column). Adding mom and dad columns would facilitate secondary use of the outputs.

sportiellomike commented 3 months ago

@jsun I fixed the mom and dad columns in the shiny as you recommended. Thanks!

I elected to defer your suggestion for the output lists not because I think it wouldn't work, just because several other functions depend on the output and would involve changing a long list of functions in the package. I will take the themes of your suggestion into account if I develop a version 2.0 and during future package development.

sportiellomike commented 3 months ago

@jsun @yjunechoe

I believe I have addressed all of your necessary proposed changes, considered and changed package in light of most of your optional suggestions, and built a shiny for use as well. Please let me know if there is anything else I can do for you. Thank you again for reviewing this software and thank you @csoneson for editing.

yjunechoe commented 3 months ago

Thanks, I see the shiny app. Could you please also create a vignette of instructions for how to use the shiny app? This can be a video walkthrough or a document of screenshots that a novice user with little programming experience (the main audience for the shiny app) can follow along.

I will also just note that in so far as app.R already exists in /inst, there's no need to copy paste the app code into shiny_mousebreedeR(). That user-facing function can just be:

shiny_mousebreedeR <- function() {
  appfile <- system.file("app.R", package = "mousebreedeR")
  shiny::runApp(appfile)
}

Please let me know once instructions for using the shiny app are available so I can go ahead and test its features.

jsun commented 3 months ago

hi, @csoneson , the checklist is down.

csoneson commented 3 months ago

Thank you @jsun!

sportiellomike commented 3 months ago

Thank you @jsun. @yjunechoe all set with your requests as well.

yjunechoe commented 3 months ago

Thanks - the package seems to be at a state where it meets all the minimum criteria for the checklist items. @csoneson all items have been checked off.

@sportiellomike In the meantime, please continue working to improve the results of devtools::check(), ideally to a point where no errors and warnings are detected. I personally haven't found any of them to be critical for installation on my end, but errors and warnings are generally indicative of a package's fragility across different user environments (for example, one such error suggests that you assume users to have {DT} installed due to the shiny app having a dependency on it - this should be declared in Suggests and dependency requirement should be checked accordingly when the shiny app is called). I recommend consulting the Writing R Extensions manual and the R Packages book as you work through this.

csoneson commented 3 months ago

@jsun, @yjunechoe - thank you very much for your thorough reviews of this package!

@sportiellomike - as the reviewers have finished their checklists and are satisfied with the submission, I will also take a quick look through it - I'll get back to you shortly with the next steps.

csoneson commented 3 months ago

@editorialbot generate pdf

csoneson commented 3 months ago

@editorialbot check references

csoneson commented 3 months ago

@editorialbot check repository

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1210/endocr/bqaa114 is OK
- 10.1371/journal.pone.0012418 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 3 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (2050.8 files/s, 121479.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               22            146            673            555
Markdown                         5             84              0            215
Rmd                              1             35             88             88
YAML                             2             12              6             57
CSV                              3              0              0             27
TeX                              1              1              0             27
-------------------------------------------------------------------------------
SUM:                            34            278            767            969
-------------------------------------------------------------------------------

Commit count by author:

    90  sportiellomike
    23  Mike Sportiello
editorialbot commented 3 months ago

Paper file info:

πŸ“„ Wordcount for paper.md is 738

βœ… The paper includes a Statement of need section

editorialbot commented 3 months ago

License info:

βœ… License found: MIT License (Valid open source OSI approved license)

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

csoneson commented 3 months ago

@sportiellomike I have looked through the submission and opened a PR with some suggestions for the paper. Please have a look and merge if you agree. In addition, could you add a couple of sentences to make the Summary section a bit more accessible for a non-specialist audience (who may not know what a 'floxed' gene is, for example)? Once this is done, you can as the editorial bot to generate a new proof by typing @editorialbot generate pdf in a post.

sportiellomike commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left: