ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Rclean: A Tool for Writing Cleaner, More Transparent Code #327

Closed MKLau closed 4 years ago

MKLau commented 5 years ago

Submitting Author: Matthew K. Lau (@mklau) Repository: https://github.com/MKLau/rclean


Type: Package
Package: Rclean
Title: A Tool for Writing Cleaner, More Transparent Code
Version: 1.1.0
Author: Matthew K. Lau
Maintainer: Matthew K. Lau <matthewklau@fas.harvard.edu>
Description: To create clearer, more concise code provides this
         toolbox helps coders to isolate the essential parts of a
         script that produces a chosen result, such as an object,
         tables and figures written to disk and even warnings and
         errors.
URL: https://github.com/ProvTools/Rclean
BugReports: https://github.com/ProvTools/Rclean/issues
License: GPL-3 | file LICENSE
Imports: igraph, jsonlite, formatR, CodeDepends, Rgraphviz, methods, knitr
Suggests: 
    roxygen2,
    testthat,
    covr
Language: en-US
Encoding: UTF-8
RoxygenNote: 6.1.1
VignetteBuilder: knitr

Scope

In writing analytical scripts, software best practices are often a lower priority than producing inferential results, leading to large, complicated code bases that often need refactoring. The "code cleaning" capabilities of the Rclean package provide a means to rigorously identify the minimal code required to produce a given result (e.g. object, table, plot, etc.), reducing the effort required to create simpler, more transparent code that is easier to reproduce.

The target audience is domain scientists that have little to no formal training in software engineering. Multiple studies on scientific reproducibility have pointed to data and software availability as limiting factors. This tool will provide an easy to use tool for writing cleaner analytical code.

There are other packages that analyze the syntax and structure of code, such as lintr, formatr and cleanr. Rclean, as far as we are aware, is the only package written for R that uses a data provenance approach to construct the interdependencies of objects and functions and then uses graph analytics to rigorously determine the desired pathways to determine the minimal code-base needed to generate an result.

Not that I can think of at the moment.

annakrystalli commented 5 years ago

Thanks for your submission @MKLau! I'll be handling your review. Stay tuned while I complete editors checks. I'll get back in touch when they are done.

MKLau commented 5 years ago

Thanks!

annakrystalli commented 5 years ago

Editor checks:


Editor comments

Installation:

Hello again @MKLau 👋

I'm having issues installing the package. Although installing from CRAN works, installing from GitHub using devtools::install_github("ProvTools/Rclean", ref = "dev") or from source locally does not, producing this error :

Downloading GitHub repo ProvTools/Rclean@master
Skipping 2 packages not available: CodeDepends, Rgraphviz
ERROR: dependency ‘CodeDepends’ is not available for package ‘Rclean’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rclean’
Error: Failed to install 'Rclean' from GitHub:
  (converted from warning) installation of package ‘/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T//Rtmp0mSvCy/filea469117a2725/Rclean_1.1.0.tar.gz’ had non-zero exit status

CodeDepends is on github so the remote repository needs to be includes in the DESCRIPTION file under Remotes.

So after adding:

Remotes:
    duncantl/CodeDepends

to DESCRIPTION, I now get a different install error:

Skipping 1 packages not available: Rgraphviz
Downloading GitHub repo duncantl/CodeDepends@master
Skipping 1 packages not available: graph
✔  checking for file ‘/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/RtmperTpaE/remotesa03323bd0a21/duncantl-CodeDepends-7c9cf7e/DESCRIPTION’ ...
─  preparing ‘CodeDepends’:
✔  checking DESCRIPTION meta-information ...
─  installing the package to build vignettes
         -----------------------------------
   ERROR: dependency ‘graph’ is not available for package ‘CodeDepends’
─  removing ‘/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmpoz9E5n/Rinsta2487ca40f8/CodeDepends’
         -----------------------------------
   ERROR: package installation failed

Both Rgraphviz & graph are bioconductor packages.

Adding:

biocViews:

just above Imports: in your DESCRIPTION file resolves the Rgraphviz issue but not the error generated for graph through attempted installation of CodeDepends. And trying to just install it on it's own with devtools::install_github("duncantl/CodeDepends") produces the same error. How did you manage to install CodeDepends?

I tried just adding graph as a dependency in your DESCRIPTION but that didn't resolve the CodeDepends error. It might be useful to get in touch with the package authors to see if they can help.


Finally, I also got these warnings:

✔  checking for file ‘/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmpkh4rA9/remotes26657867070/ProvTools-Rclean-47ec758/DESCRIPTION’ ...
─  preparing ‘Rclean’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  looking to see if a ‘data/datalist’ file should be added
     NB: this package now depends on R (>= 3.5.0)
     WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/clean.simple.out.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/format.simple.out.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: 'Rclean/tests/testthat/lib.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/libs.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: 'Rclean/tests/testthat/opt.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: 'Rclean/tests/testthat/pi.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/prov.g.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/prov.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/rp.clean.x.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/rp.clean.y.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/rp.clean.y.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  'Rclean/tests/testthat/rp.options.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: 'Rclean/tests/testthat/s.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: 'Rclean/tests/testthat/sp.test.rda'  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: 'Rclean/tests/testthat/vl.test.rda'
─

Which suggest that you need to set a minimum version of R in DESCRIPTION. For more information see: https://github.com/r-lib/devtools/issues/1912 Adding:

 Depends:
    R (>= 3.5.0)

in your DESCRIPTION file indeed solves the problem.


So until these installation issues are resolved, I can't really proceed with the rest of the editors checks as they are all performed on the source code which needs to be able to be installed from source (ie not just downloading the binary from CRAN. For example, running devtools::check() just errors because of missing the CodeDepends package.

Just let me know when this problem is fixed and I can carry on with checks.


Reviewers: Due date:

MKLau commented 5 years ago

Hi @annakrystalli, thanks for your feedback on this. I've been looking into the issues with installing those dependencies.

The package installs from dev on github via devtools on my machine and on Travis' server. However, I manually installed the CodeDepends on my machine, and Travis manually installs as well. I haven't yet done a full install on a fresh machine just using devtools.

This is an important issue, as it would make install much easier to just run one command. So, I'll sort this out and let you know.

MKLau commented 5 years ago

Hi @annakrystalli, alright, I did a fresh install of R on my machine with the addition of all your dependency fixes to the DESCRIPTION and I get a similar error as you:

> install_github("Provtools/Rclean", ref = "deps")
Downloading GitHub repo Provtools/Rclean@deps
Downloading GitHub repo duncantl/CodeDepends@master
Skipping 1 packages not available: graph
Installing 2 packages: graph, XML
Error: Failed to install 'Rclean' from GitHub:
  Failed to install 'CodeDepends' from GitHub:
  (converted from warning) package ‘graph’ is not available (for R version 3.6.1)

Travis still installs without error, which is installing the graph package separately.

I've done some searching on Stack Exchange, and I can see why RGraphviz would install but not graph. I'll keep searching for possible workarounds.

MKLau commented 5 years ago

@annakrystalli

Hi Anna, I still haven’t found a solution to the graph package install other than manually installing graph via BiocManager. So, if you’d like you should be able to proceed by first installing graph before installing

install.packages(“devtools”)
install.packages(“BiocManager”)
BiocManager::install(“graph”)
devtools::install_github(“provtools/rclean”, ref = “dev”)

It looks like the CodeDepends maintainers are working on a solution for this (https://github.com/duncantl/CodeDepends/issues/30) but currently they’re essentially using the manual install method like this.

I’m adding this approach to the dev branch install instructions but I’m dissatisfied with this as an install method. I’ll keep looking for a fix and follow the updates from the CodeDepends folks.

annakrystalli commented 5 years ago

Hey @MKLau, sorry for slow reply, I am actually on holiday so will probably be slow getting back to you for the next week also. Apologies!

In the meantime, I found a short term workaround that hopefully we might be able to turn into a more long-term one.

I forked CodeDepends and just added biocViews: to the DESCRIPTION file in my fork: https://github.com/annakrystalli/CodeDepends.

I also changed the remote repository pointer in the Rclean DESCRIPTION to annakrystalli/CodeDepends to install my fork of the package and installation of RClean worked! ✨🎉. This setup installs version 1.62 of graph. Out of curiosity, what version do you have installed? I quickly ran the tests and they all passed so on the surface this version of graph seems to work with Rclean.

I'll make a pull request to the CodeDepends repo and hope they will merge it. But if not, you could just maintain a fork of CodeDepends that your package depends on where you include biocViews: in the DESCRIPTION. In fact, I would suggest that you make a fork, set it up as I did and use that as your dependency for now. That would allow both myself and reviewers to proceed with installing and reviewing the package. Then, if and once the PR is merged, you can revert back to the upstream CodeDepends repo.

Sound good?

MKLau commented 5 years ago

Great, @annakrystalli! Copy that. I’ll use this workaround and follow what the CodeDepends issue you opened. Hopefully they just merge it in, which seems likely.

Enjoy the rest of your break!

MKLau commented 5 years ago

Now completed, tested and merged into dev branch.

annakrystalli commented 5 years ago

OK great and apologies for massive delay! Been pretty snowed under since I got back and still catching up. I've now been able to complete editors checks and installation worked fine with the new setup.

So, firstly and most importantly, I'd recommend that all your installs happen from the master branch and that this is always stable. Use your dev branch for the more experimental / development state of code.

The rest of the editors checks threw up only minor issues.

goodpractice

goodpractice::gp()

Throws a warning

  ✖ fix this R CMD check NOTE: Namespaces in Imports field not imported from: ‘BiocManager’ ‘knitr’ All declared Imports
    should be used.

I wonder, are there calls to functions in BiocManagers in your code? It's hard to search the dev branch of your repo so could figure it out without scanning all your R/ code (which I didn't do 😜).

spell_check

A few potential genuine typos

devtools::spell_check("/Users/Anna/Documents/workflows/rOpenSci/editorials/Rclean")
#>   WORD              FOUND IN
#> formate           p.spine.Rd:10
#> funcitons         codeGraph.Rd:16
#> inforrmation      get.libs.Rd:16
#> parantage         get.spine.Rd:16
#> Parenatage        p.spine.Rd:5
#> Prases            parse.graph.Rd:5
#> reltionships      parse.graph.Rd:6
#> reproduciblity    README.md:49
#>                   Rclean.rmd:41

Created on 2019-09-10 by the reprex package (v0.3.0)

package coverage

Great coverage overall! Is there are reason var.id.R and write.code.R have such low coverage though?

covr::package_coverage("/Users/Anna/Documents/workflows/rOpenSci/editorials/Rclean")
#> Rclean Coverage: 81.22%
#> R/var.id.R: 0.00%
#> R/write.code.R: 45.45%
#> R/var.lineage.R: 70.00%
#> R/p.spine.R: 72.73%
#> R/parse.graph.R: 81.82%
#> R/parse.info.R: 87.50%
#> R/clean.R: 91.67%
#> R/codeGraph.R: 100.00%
#> R/get.libs.R: 100.00%
#> R/get.spine.R: 100.00%
#> R/read.prov.R: 100.00%

Created on 2019-09-10 by the reprex package (v0.3.0)

@MKLau, all in all I'm happy to proceed with finding reviewers. If you could have a go at fixing these minor issues during that time, that would be great. For me the most important is to get the master branch up to date and use that as the stable version of your package that will be reviewed.

Could you please add the rOpenSci under review badge to your README?

[![](https://badges.ropensci.org/327_status.svg)](https://github.com/ropensci/software-review/issues/327)

Thanks again for your patience during the holidays!


Reviewers: Due date:

MKLau commented 5 years ago

Hi @annakrystalli, no worries! Hope you had a relaxing/restorative holiday.

Thanks for the updates on this, and happy to see the review move forward.

MKLau commented 5 years ago

@annakrystalli FYI - master is now up to date with dev (now v1.1.0). CRAN is still v1.0.0.

annakrystalli commented 5 years ago

Great work @MKLau !

Regarding your comments:

dev and master branch usage: I had stopped moving master forward as so many changes were happening due to the ROpenSci and JOSS review. Travis is currently running checks on the following updates (see following items). I'll merge into master after those checks pass. However, CRAN will now be behind master. Do you recommend updating it or waiting for the ROpenSci review process to proceed?

You should treat the GitHub version of your code as the development version in general and it is absolutely fine for it to not reflect CRAN. The way you link your GitHub code version to the CRAN released it through tags which snapshot more formally the state of the repository upon release. After that you are free to continue developing on GitHub by adding .9000 to the end of your release version in the DESCRIPTION to indicate this is the development version of the software.

That is somewhat different to the use of branches. You still want the master branch in the repository to reflect the most stable state of the development code and it should generally be the go to outward facing version of the development source code. I think the chapter on releasing packages sort of discusses this but think instead about how in general we often see in READMEs instruction to download from CRAN or download the development version from GitHub which is generally the master branch. Does that make sense?

write.code only has partial coverage because part of its functionality is copying to the clipboard, which I haven't figured out how to write tests for. If you (or maybe one of the reviewers) have a suggestion, I'd be happy to get the coverage on that function higher.

My go-to strategy in these situations is to have a look at the tests in packages that use the feature I want to test. I initially thought of reprex and saw that that just calls clipr::write_clip(). So here's a link to the tests script for that function write_clip() 😃.

For all your other comments: ✨

annakrystalli commented 5 years ago

Another minor thing I've just noted. In the DESCRIPTION file, we require that author details are captured using the Authors@R notation not character strings. More details in the relevant r packages chapter.

MKLau commented 5 years ago

Great! Thanks for the pointers. Per these suggestions, I’ll go ahead and:

annakrystalli commented 5 years ago

We have reviewers! 🥳

Many thanks again for agreeing to review @wlandau and @nevrome. See below for review data. Any questions, just reach out.


Reviewers: @wlandau @nevrome Due date: 2019-10-21

nevrome commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 3


Review Comments

A very interesting package to simplify R scripts. I really like the underlying idea and the main functions clean and write.code work well when applied to the example script example/simple_script.R. Good job, @MKLau! However, the usability of the package can be increased a bit more. Some suggestions in no particular order:

annakrystalli commented 5 years ago

Thanks for your speedy response @nevrome!

@Mklau, it's probably best to wait till both reviews are in and then respond.

MKLau commented 5 years ago

@annakrystalli, that’s what I was planning.

Double thanks @nevrome!

wlandau commented 5 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 4


Review Comments

Rclean is a promising package. It goes above and beyond typical static code analysis and addresses a huge unmet needs in scientific workflows. Well done, @MKLau!

Documentation

A recent survey of over 1500 scientists reported a crisis of reproducibility with “selective reporting” being the most cited contributing factor and 80% saying code availability is playing a role.

API

Installation and automated checks

install.packages("BiocManager")
BiocManager::install(c("graph", "Rgraphviz"))
It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 86% of code lines are covered by test cases.

    R/clean.R:49:NA
    R/clean.R:72:NA
    R/clean.R:82:NA
    R/clean.R:91:NA
    R/clean.R:95:NA
    ... and 21 more lines

  ✖ use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    R/var.id.R:41:44
    R/var.id.R:42:47

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/clean.R:47:1
    R/clean.R:93:1
    R/get.spine.R:44:1
    R/parse.info.R:39:1
    R/parse.info.R:40:1
    ... and 1 more lines

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/clean.R:150:29
    R/clean.R:151:55
    R/clean.R:169:34

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/get.libs.R:40:13
    R/get.libs.R:42:17
    R/get.libs.R:44:21
    R/var.id.R:39:13
    R/var.lineage.R:43:22
    ... and 4 more lines

Miscellaneous

MKLau commented 5 years ago

@wlandau thanks so much for a speedy review as well!

Thanks to everyone for helping to improve and push this project forward.

annakrystalli commented 5 years ago

Thanks indeed @wlandau! 🎉

Lot's of useful and insightful comments in both reviews so over to you now @MKLau 😊

wlandau commented 5 years ago

One clarification on a couple points I made:

I am questioning the need for most of Rclean's exported functions...Do we really need p.spine() and p.spine()?

Those functions are necessary to implement clean(), and I recommend keeping them internally. I only question the need for an @export tag because I suspect most users will not invoke these functions directly.

In fact (and this is not strictly necessary to satisfy my review) it could be helpful to split clean() up into multiple helper functions. This could reduce the deeply nested logic in your code and make Rclean easier to maintain. A couple resources:

MKLau commented 4 years ago

Thanks for taking the time to clarify @wlandau. I get what you’re saying both about the structure of the API and the logic of the clean function. This review process has been great because it’s forcing me to hone the target user/application, which has meandered a bit in development. I think the inclination to make the spine and other functions internal is the way to go.

Also, thanks for the pointers, especially the cyclocomp package.

danielskatz commented 4 years ago

Please update https://github.com/openjournals/joss-reviews/issues/1312 when this is accepted.

annakrystalli commented 4 years ago

Hi @MKLau, Just checking in with you. Do you have an ETA for a response to reviewers?

MKLau commented 4 years ago

Hi @annakrystalli, I’m finishing up addressing a final comments this week, and, as long as those get done, I should be able to write-up a response early next week.

MKLau commented 4 years ago

General Response

Thanks again to @wlandau and @nevrome for reviewing the package and, @annakrystalli included, providing insightful comments. I have updated the package greatly based on the various suggestions and questions. These updates are documented in the package's github repo issue and change logs. I detailed the changes as they pertain to the reviewers' comments point by point below. For unchecked, please see the comments that follow. In addition, I would like to point out that the package repo is now hosted at https://gitub.com/MKLau/Rclean, which is the result of shifts in how the project is now being managed.

Response to Reviewer Comments

Documentation

JOSS Co-Submission

@wlandau: missing all sections @nevrome: missing short summary

@mklau: The initial submission of the JOSS paper was based on an example of a submitted article present on the JOSS instructions. However, after this package review, there will be substantial content added to the article. I am currently working on these updates to the text.

Functionality

API

Installation and automated checks

Miscellaneous

annakrystalli commented 4 years ago

Thanks for the detailed response @MKLau!

@wlandau & @nevrome, please let us know whether you are happy with the responses/actions to your review comments.

wlandau commented 4 years ago

@MKLau, fantastic work on the revisions! You put a ton of effort into refactoring the code, rethinking the package scope, and strengthening the documentation. The behavior of Rclean is more internally consistent, and the usage and intent are far easier to understand.

After reviewing https://github.com/MKLau/Rclean/commit/b674396e4c84f58ae8a46ac50f7312650c300e07, I have some minor follow-up comments and requests.

Rclean Coverage: 89.69%
R/keep.R: 83.33%
R/clean.R: 85.00%
R/get_libs.R: 85.71%
R/get_vars.R: 85.71%
R/code_graph.R: 92.31%
R/var_lineage.R: 100.00%
         filename  functions line value
51      R/clean.R      clean   57     0
53      R/clean.R      clean   58     0
79      R/clean.R   get_path  132     0
81      R/clean.R   get_path  135     0
82      R/clean.R   get_path  136     0
83      R/clean.R   get_path  137     0
20 R/code_graph.R code_graph   46     0
9    R/get_libs.R   get_libs   42     0
18   R/get_vars.R   get_vars   45     0
31       R/keep.R       keep   49     0
 ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/code_graph.R:15:1
    R/keep.R:17:1
    R/var_lineage.R:15:1
    R/var_lineage.R:22:1

  ✖ fix this R CMD check NOTE: Namespace in Imports field not
    imported from: ‘jsonlite’ All declared Imports should be used.

Without retrospective provenance, it appears you no longer need jsonlite. Also, line length is ultimately a matter of personal style, but lines less than 80 characters are relatively easy to read. You could either wrap those lines or let me know what you prefer stylistically.

#' data(simple_script)
#' ## Copies code to your clipboard
#' keep(simple_script)
#' ## Saves code to your disk
#' keep(simple_script, file = "clean_code.R")
nevrome commented 4 years ago

I agree with @wlandau: You did an excellent job refactoring Rclean, @MKLau. I really enjoyed reading the vignette and running the example code. I will keep this in mind to analyse my own scripts.

Maybe an RStudio Addin would be nice: Select a variable in your code file with your mouse, press a key combo, get the clean() result. Would make it even more convinient and is simple enough to implement with rstudioapi.

Some last, small remarks:


script <- "example/simple_script.R"

with

script <- system.file("example", "simple_script.R", package = "Rclean")

If you want to add the reviewers to the DESCRIPTION file, then you can use the following string for me:

    person(given = "Clemens",
             family = "Schmid",
             role = "rev",
             email = "clemens@nevrome.de",
             comment = c(ORCID = "0000-0003-3448-5715"))
danielskatz commented 4 years ago

pinging @labarba as the JOSS editor for that part of this - note the comment about the JOSS paper and review (https://github.com/openjournals/joss-reviews/issues/1312) in the comment above.

labarba commented 4 years ago

@danielskatz — I don't understand what I need to opine about.

danielskatz commented 4 years ago

the part:

Afaik the JOSS editors assume that the paper is perfectly fine already when it went through the rOpenSci review. Can you confirm that @annakrystalli? The paper is not completely ready yet though, so it does not make sense for me to read and review it at this point. I wonder what's the correct procedure here to make sure that JOSS receives reviewed texts.

labarba commented 4 years ago

When a paper goes through rOpenSci review, it gets handled by @arfon on the JOSS side. I have not handled any of these.

danielskatz commented 4 years ago

Actually, it can be handled by the AEiC on duty - I've done this a few times, as have others. See https://joss.readthedocs.io/en/latest/editing.html#processing-of-ropensci-reviewed-and-accepted-submissions

danielskatz commented 4 years ago

In any event, to answer @nevrome's question, my understanding is that once this submission is accepted by rOpenSci, the JOSS Associate-Editor-in-Chief responsible for the JOSS review (which is currently @labarba) will review the paper to make sure it meets JOSS standards, but will assume the review of everything else done by rOpenSci is sufficient. (But maybe it's worth asking @arfon to make sure I have this right)

nevrome commented 4 years ago

Thank you for your answer, @danielskatz. My issue is mostly caused by the unclear border between code and text review.

@wlandau and I provided code review according to this Guide for Reviewers. The guide does not mention the JOSS paper. The review template contains the following section, though:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

So obviously it is expected that I read and review the text. Also from my understanding of your answer, @danielskatz, there won't be additional review beyond a rather general check of the editor over at JOSS.

The takeaway message is that the course of action suggested above by @MKLau is not possible:

The initial submission of the JOSS paper was based on an example of a submitted article present on the JOSS instructions. However, after this package review, there will be substantial content added to the article. I am currently working on these updates to the text.

If there are significant changes of the article AFTER this review, then they won't be reviewed at all.

annakrystalli commented 4 years ago

Thanks @nevrome for picking this up.

What we can do is consider the code review of the package complete and move it to rOpenSci. I will coordinate with @MKLau and when the paper.md is complete and ready for JOSS submission, I will review it before it gets submitted (unless of course you want to @nevrome). Ultimately I'm happy to consider your contribution complete and do the final check myself.

danielskatz commented 4 years ago

If there are significant changes of the article AFTER this review, then they won't be reviewed at all.

Well, they are reviewed by the Associate-Editor-in-Chief, but we would certainly appreciate anything you do that improves the paper. Whether this is an obligation of rOpenSci reviewers of submissions that are also being submitted to JOSS is for rOpenSci to determine.

nevrome commented 4 years ago

Thanks for the clarification, @annakrystalli and @danielskatz. The solution suggested by you, @annakrystalli, is fine for me. I can also offer to read the text when it's ready and if you have the feeling that some additional feedback might be helpful, @MKLau.

MKLau commented 4 years ago

Great, thanks to everyone for clarifying the relationship between the software and article review. Thanks for the offer to review the manuscript (@nevrome), it would be great to get your input on it if you have the time to spare.

@annakrystalli, I’ll send you the updated manuscript soon. Hopefully by week’s end.

MKLau commented 4 years ago

@wlandau and @nevrome, thanks also for the follow-up comments and suggestions.

Thanks!

wlandau commented 4 years ago

I see. Do you think the simple script is sufficient to test Rclean's ability to sift out "tangled" variables, or are there different kinds of decisions Rclean makes that you could explore with the longer script?

Either way, I highly recommend accruing tests for all the bug reports that come your way over time. If you write a new test for each bug as you fix it, your test suite will become powerful over time.

Please ping me when you are ready for me to take another look.

MKLau commented 4 years ago

@wlandau thanks, point well taken. Now that the package is stabilizing (and much improved thanks to this review), I can spend more time thinking about how to test it. For now, it seems to be at a place where it should work enough for folks to use it enough to report errors and not just pass on it and not report anything.

annakrystalli commented 4 years ago

Thanks @MKLau. Just a note that good testing is just as much part of a package maturing as is adding functionality and shouldn't be considered something you do afterward but rather, as you add functionality.

As you say, time and bug reports are important in catching errors from edge cases, but I think what @wlandau (feel free to correct me if I'm misinterpreting!) is trying to get you to do is think defensively about any systematic ways in which different uses of the function could break. This could be different combinations of arguments, different inputs etc. Even 100% coverage does not indicate catching these as it's not about testing a line of code once (which could lead to 100% coverage) but under various realistic use situations.

@wlandau do you have any particular tests you have in mind? Or do you think as @MKLau suggest that the test suite is good enough for this stage?

wlandau commented 4 years ago

@MKLau, I like these tests, and I like that they are specific and targeted. I strongly suggest a couple more tests like them for a couple variables in long_script.R.

I also think it is important to compare the actual values of the results you get from actually running the scripts. A cleaned script ultimately needs to produce the same output you get from the original. So for this test, you could

  1. Run source(simple.script, local = some_new_environment) and inspect some_new_environment to get one value of out.
  2. Similarly, get another value of out from the cleaned script.
  3. Compare the two values of out using expect_equal().

You currently compare the actual lines of code, which I suggest you keep. However (and this is not part of my official review) rda files may not be the easiest thing to work with in this case. I think flat .R files would be easier to work with. To avoid clutter in the tests/testthat directory, you could give each test-*.R script its own test-*/ directory to keep track of expected results.

One idea for a feature (again, not part of my review) is to implement validation as a core feature of Rclean: have a function that runs each script in its own fresh R environment or session and asserts that they give identical values for all the variables of interest. callr and tempfile()s could be handy on the implementation side.

It is easy to fool static code analysis, and I expect unsolvable edge cases to come up eventually. When that happens, I encourage you to document them so users understand how Rclean.

MKLau commented 4 years ago

@annakrystalli I definitely agree with the point of defensive testing. These comments of @wlandau’s with more specifics and pointers are great. I’ll get to those test tomorrow and add these suggested features to the issues list as enhancements. Also, I will look into streamlining the bug reporting processes. I have documentation for that but it could be improved and more could be done automatically.

Also, I like the idea of validation. In theory, the code extract should produce the same result, unless there is randomness, etc. If it could be done, it would be great to implement as a part of the RStudio add-in, as suggested previously by @nevrome. That is, when code is extracted for a result(s), it is automatically checked.

Thanks!

wlandau commented 4 years ago

Also, I will look into streamlining the bug reporting processes. I have documentation for that but it could be improved and more could be done automatically.

To help encourage bug reporting, you might have a look at issue templates like this one.

MKLau commented 4 years ago

@wlandau, perfect, thanks for these.