ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Rclean: A Tool for Writing Cleaner, More Transparent Code #327

Closed MKLau closed 4 years ago

MKLau commented 5 years ago

Submitting Author: Matthew K. Lau (@mklau) Repository: https://github.com/MKLau/rclean


Type: Package
Package: Rclean
Title: A Tool for Writing Cleaner, More Transparent Code
Version: 1.1.0
Author: Matthew K. Lau
Maintainer: Matthew K. Lau <matthewklau@fas.harvard.edu>
Description: To create clearer, more concise code provides this
         toolbox helps coders to isolate the essential parts of a
         script that produces a chosen result, such as an object,
         tables and figures written to disk and even warnings and
         errors.
URL: https://github.com/ProvTools/Rclean
BugReports: https://github.com/ProvTools/Rclean/issues
License: GPL-3 | file LICENSE
Imports: igraph, jsonlite, formatR, CodeDepends, Rgraphviz, methods, knitr
Suggests: 
    roxygen2,
    testthat,
    covr
Language: en-US
Encoding: UTF-8
RoxygenNote: 6.1.1
VignetteBuilder: knitr

Scope

In writing analytical scripts, software best practices are often a lower priority than producing inferential results, leading to large, complicated code bases that often need refactoring. The "code cleaning" capabilities of the Rclean package provide a means to rigorously identify the minimal code required to produce a given result (e.g. object, table, plot, etc.), reducing the effort required to create simpler, more transparent code that is easier to reproduce.

The target audience is domain scientists that have little to no formal training in software engineering. Multiple studies on scientific reproducibility have pointed to data and software availability as limiting factors. This tool will provide an easy to use tool for writing cleaner analytical code.

There are other packages that analyze the syntax and structure of code, such as lintr, formatr and cleanr. Rclean, as far as we are aware, is the only package written for R that uses a data provenance approach to construct the interdependencies of objects and functions and then uses graph analytics to rigorously determine the desired pathways to determine the minimal code-base needed to generate an result.

Not that I can think of at the moment.

MKLau commented 4 years ago

@wlandau Just added a new test that checks the reproducibility of variables in cleaned scripts, please see here.

Let me know your thoughts, in particular I'm wondering if you would separate the variables into individual tests.

wlandau commented 4 years ago

Nice, that's a great test. I added some suggestions at https://github.com/MKLau/Rclean/issues/185#issuecomment-565273312. In the interest of avoiding a rabbit hole, only (1) is part of my official review, e.g.

What about the ability of cleaned scripts to exclude variables? You could test that fit.xx, fit.sqrt.A, and fit.anova are in env.long but not the other environments.

MKLau commented 4 years ago

@wlandau great, thanks for the rapid and considerate feedback in this. Will do.

MKLau commented 4 years ago

@wlandau I've added two new tests (see here that should address your last request in the comment above. They don't directly test for the "exclusion" of variables, instead they test that a set of expected variables are present in the cleaned scripts when sourced in a new environment. I've manually inspected these to verify that they are correct. Let me know if this deviates from the intent of your recommended test.

wlandau commented 4 years ago

I reviewed those new tests, and I think the test suite is now diligent and defensive enough. My follow-up requests are minor:

> packageVersion("lintr")
[1] ‘2.0.0’
> lintr::lint_package("Rclean")
...............
inst/example/long_script.R:25:33: style: There should be a space between right parenthesis and an opening curly brace.
for (i in seq_along(colnames(x))){
                                ^~
inst/example/long_script.R:58:1: style: Variable and function name style should be snake_case.
fit.23 <- lm(x2 ~ x3, data = data.frame(x2[, 1], x3[, 1]))
^~~~~~
inst/example/long_script.R:64:1: style: Variable and function name style should be snake_case.
fit.xx <- lm(A~B, data = x)
^~~~~~
inst/example/long_script.R:70:1: style: Variable and function name style should be snake_case.
fit.sqrt.A <- lm(I(sqrt(A))~B, data = x)
^~~~~~~~~~
inst/example/long_script.R:76:57: style: Trailing whitespace is superfluous.
## After that. I came back and ran another analysis with 
                                                        ^
inst/example/long_script.R:79:25: style: Put spaces around all infix operators.
z <- c(rep("A", nrow(x2)/2), rep("B", nrow(x2)/2))
                       ~^~
inst/example/long_script.R:79:47: style: Put spaces around all infix operators.
z <- c(rep("A", nrow(x2)/2), rep("B", nrow(x2)/2))
                                             ~^~
inst/example/long_script.R:80:1: style: Variable and function name style should be snake_case.
fit.anova <- aov(x2 ~ z, data = data.frame(x2 = x2[, 1], z))
^~~~~~~~~
inst/example/micro.R:1:2: style: Put spaces around all infix operators.
x<- 1
~^
inst/example/micro.R:2:3: style: Put spaces around all infix operators.
y <-3
  ^~~

# There are more...
MKLau commented 4 years ago

Hi Will (@wlandau), I've now made the system.file() changes and fixed the lints in all the code files, and I've added an issues template for bug reporting. I went ahead and added lintr to Travis as an after-success check as well. One thing to note, I have not changed/fixed all of the lints (such as camel_case for all variables and spacing, etc.) for the example scripts, as those examples contain certain "issues" purposely to imitate quickly composed, "realistic" scripts.

All changes have been merged into the current master (PR #195). Let me know your thoughts and if I've adequately addressed your last round of comments.

Cheers,

Matt

wlandau commented 4 years ago

Thanks, @MKLau. We are almost there. The very last issue on my end is some trouble running the updated example of keep().

script <- system.file(
        "example", 
        "simple_script.R", 
        package = "Rclean")
clean.code <- clean(script, “tab.15”)
#> Error: unexpected input in "clean.code <- clean(script, �"

I think it is because you are using (probably unicode) and not " (34 in ASCII) for quotes. Should be a simple fix here.

MKLau commented 4 years ago

Ah, thanks for catching that @wlandau. Must have been from a copy-paste. Will fix now!

MKLau commented 4 years ago

Fixed and committed to master.

wlandau commented 4 years ago

Confirmed, thanks.

You have addressed all my feedback, and Rclean has come a long way in a short time. As a reviewer, I approve Rclean for rOpenSci. Well done, @MKLau!

annakrystalli commented 4 years ago

Thanks for your efforts all involved! 👏

I'm currently on the mountain 🏂 on my last day of hols but will set the wheels in motion for finalisation of approval tomorrow morning when I'm back at my laptop.

Sent from my iPhone

On 8 Jan 2020, at 01:14, Will Landau notifications@github.com wrote:

 Confirmed, thanks.

You have addressed all my feedback, and Rclean has come a long way in a short time. As a reviewer, I approve Rclean for rOpenSci. Well done, @MKLau!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MKLau commented 4 years ago

Thanks @wlandau, and everyone else for the input/help!

@annakrystalli looking forward to it, but have a great rest of the holiday and get safely down off the mountain!

annakrystalli commented 4 years ago

Approved! 🥳🙌

Thanks @MKLau for submitting and @wlandau and @nevrome for your reviews!

To-dos:

For submission to JOSS

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

danielskatz commented 4 years ago

Note that this is already under review by JOSS, and that review has been paused while the rOpenSci has proceeded. Once this rOpenSci process is complete, the JOSS review can restart, and should be fast.

MKLau commented 4 years ago

Great, thanks Anna (@annakrystalli), will get on those ASAP.

Also, thanks @danielskatz for keeping the manuscript in a holding pattern. It has been greatly improved via the software review. Looking forward to getting it finished.

stefaniebutland commented 4 years ago

Congratulations @MKLau on passing review, and your pending JOSS publication. Would you be interested in writing a Tech Note for rOpenSci about your package? Anna suggests that it is an "interesting package that could really simplify creating reproducible examples and pairing down code to the essentials for producing a given output."

Tech Notes are written for an audience that wants details and should provide something a reader could not glean from the documentation itself.

If interested, after JOSS publication is probably best timing, so you could respond then. Instructions are here: https://github.com/ropensci/roweb2#contributing-a-blog-post. Typically you would submit a draft by pull request, I review, and we can publish within a week.

MKLau commented 4 years ago

Thanks, @stefaniebutland, I have read a number of the Tech Notes you’ve helped to put out, and I’ve always have found them to be engaging and useful. I’m definitely interested and I’ll keep it in mind as the manuscript is in review.

MKLau commented 4 years ago

Hi @annakrystalli, Almost done with the transfer to ROS and edits on the JOSS manuscript. Github has been throwing a slow Unicorn! message whenever I try and create a pull request now that Rclean has moved to ROpenSci. Will try again tomorrow to merge the new changes that have updates to Travis and Codecov.

A couple questions:

Thanks!

Matt

annakrystalli commented 4 years ago

Hello @MKLau ! I've just transferred full admin rights back to you so you should have full control of the repo again.

The paper looks good to me! I wonder if you should show loading the library? Otherwised I reckon it's good to go.

And yes well spotted, please ignore initial instruction to add_ro_desc()

MKLau commented 4 years ago

Thanks for looking over the paper, @annakrystalli . I'll go ahead and add a line showing library("Rclean").

One more question, how do I enable Zenodo watching? I don't see it when I access Zenodo. Should I have done this prior to transferring the repo to ROpenSci?

Also, I keep getting a long loading time page.

image

Do you have access to the repo? Would you be able to check if you can open a new pull request?

MKLau commented 4 years ago

@annakrystalli

I'm going to go ahead and send to JOSS. Let me know if you get a chance to look at the pull request initiation from your end though.

annakrystalli commented 4 years ago

Hi @MKLau, just looking into the pull request issues you're having. I've managed to make a successful PR (not merged). https://github.com/ropensci/Rclean/pull/201

When and where exactly are you getting the slow unicorn message?

MKLau commented 4 years ago

Ah, looks like it's resolved now! I can see your pull request.

It was throwing the slow unicorn every time I tried to go to view the pull request. Maybe it had something to do with the transfer from my personal profile to the ROS org.

annakrystalli commented 4 years ago

Good to hear it's all working now! I take it you are in the process of completing with JOSS too right? I'm going to go ahead and close this issue now.

MKLau commented 4 years ago

@annakrystalli yes it's now underway (again). Thanks again for all your help!

stefaniebutland commented 4 years ago

Congratulations on your JOSS publication @MKLau. We would love to have a technote about Rclean as noted above

Anna suggests that it is an "interesting package that could really simplify creating reproducible examples and pairing down code to the essentials for producing a given output.

We now have more detailed guidance: https://blogguide.ropensci.org/

If you're interested, please suggest a date for submission and I can provide a publication date.

MKLau commented 4 years ago

Hi @stefaniebutland, yes, we would be interested in publishing a technote with an announcement of the package. I am currently in a grant writing period but would be able to write something up next week. Would that time frame work?

stefaniebutland commented 4 years ago

Yes it would thank you. Please submit when you're ready using tentative publication date 2020-03-17.

cc @ropensci/blog-editors

MKLau commented 4 years ago

@stefaniebutland

Great, sounds good.

MKLau commented 4 years ago

Hi @stefaniebutland,

Just finished with proof reading and spellcheck of the technote. You can find it here: https://github.com/ropensci/Rclean/blob/technote/ropensci/rclean_technote.md

The associated Rmd file is in the same directory on that branch too, if you need to have a look.

I’ve written it as an expanded version of the JOSS article, with a bit more discussion of a few details. Happy to have any edits and/or suggestions though.

Thanks and hope you’re healthy and well.

steffilazerte commented 4 years ago

Hi @MKLau, I've been chatting with @stefaniebutland and we were thinking that your article looks a bit more like a blog post than a tech note (particularly the section talking about the Provenance Engine). This is great!

To get this published, I invite you to submit it as a pull request to the roweb2 repository. You can see the full instructions for setting it up in roweb2 in the Blog Guide, particularly the chapter on Technical Guidelines. If you agree with our assessment of a blog post, rather than tech note, just follow the set up for blog post.

I'll be your friendly reviewer and will be ready to review on Monday, if you can open the pull request by then. Once you're ready for review, either let me know in a comment on the pull request or change the pull request from Draft to Non-Draft.

Thanks!

MKLau commented 4 years ago

Hi @steffilazerte, thanks for enlisting to review! Happy to go either way. If this seems more like a blog post to you two, I'm fine with that. I'll read up on roweb2 and I'll aim to submit before Monday morning.