ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

jagstargets: reproducible JAGS pipelines at scale #425

Closed wlandau closed 3 years ago

wlandau commented 3 years ago

Date accepted: 2021-09-27 Submitting Author: Will Landau (@wlandau) Repository: https://github.com/wlandau/jagstargets
Version submitted: 0.0.0.9000 Editor: !--editor-->@noamross<!--end-editor-- Reviewers: @dill, @ernestguevarra

Due date for @dill: 2021-05-11 Due date for @ernestguevarra: 2021-06-15

Archive: TBD
Version accepted: TBD


Package: jagstargets
Title: Targets for JAGS Workflows
Description: Bayesian data analysis usually incurs long runtimes
  and cumbersome custom code. A pipeline toolkit tailored to Bayesians,
  the 'jagstargets' R package is leverages
  'targets' and 'R2jags' to ease this burden.
  'jagstargets' makes it super easy to set up useful scalable
  JAGS pipelines that automatically parallelize the computation
  and skip expensive steps when the results are already up to date.
  Minimal custom code is required, and there is no need to manually
  configure branching, so usage is much easier than 'targets' alone.
  For the underlying methodology, please refer
  to the documentation of 'targets' <doi:10.21105/joss.02959> and 'JAGS'
  (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/jagstargets/, https://github.com/wlandau/jagstargets
BugReports: https://github.com/wlandau/jagstargets/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  coda (>= 0.19.4),
  digest (>= 0.6.27),
  fst (>= 0.9.4),
  posterior (>= 0.1.3),
  purrr (>= 0.3.4),
  qs (>= 0.14.1),
  R2jags (>= 0.6.1),
  rjags (>= 4.10),
  rlang (>= 0.4.8),
  stats,
  targets (>= 0.0.1),
  tarchetypes (>= 0.0.1),
  tibble (>= 3.0.4),
  tools,
  utils,
  withr (>= 2.3.0),
Suggests:
  dplyr (>= 1.0.2),
  fs (>= 1.5.0),
  knitr (>= 1.28),
  R.utils (>= 2.10.1),
  rmarkdown (>= 2.1),
  testthat (>= 3.0.0),
  tidyr (>= 1.1.2),
  visNetwork (>= 2.0.9)
Remotes:
  stan-dev/posterior
SystemRequirements: JAGS 4.x.y (http://mcmc-jags.sourceforge.net)
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

jagstargets leverages the existing workflow automation capabilities of targets to orchestrate computation and skip up-to-date tasks in Bayesian data analysis pipelines. jagstargets reduces the burden of user-side custom code that targets would otherwise require, which helps free statisticians to focus more on the models and less on the software engineering.

jagstargets is for Bayesian statisticians who develop and run JAGS models. Example workflows range from individual analyses of clinical data to large-scale simulation-based calibration studies for validation.

targets already provides the same kind of workflow automation, but it requires more custom code to set up a workflow. jagstargets uses specialized domain knowledge to make this process easier. Packages rjags and R2jags interface with JAGS but provide little workflow automation.

N/A

N/A

Technical checks

Confirm each of the following by checking the box.

This package:

The continuous integration is temporarily having some issues because tarchetypes was just accepted to CRAN and the binaries are not yet available. (Should be fixed now.)

Publication options

This package depends on posterior, which is not yet on CRAN.

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

I have prepared a JOSS manuscript, and I intend to submit it if jagstargets clears rOpenSci software review.

Code of conduct

noamross commented 3 years ago

@ropensci-review-bot assign @noamross as editor

ropensci-review-bot commented 3 years ago

Assigned! @noamross is now the editor

noamross commented 3 years ago

Hi @wlandau! Running editor checks, I am able to fully test the package but have the following issue checking the package using devtools::check():

✓  checking for file ‘/home/noamross/reviews/jagstargets/DESCRIPTION’ ...
─  preparing ‘jagstargets’:
✓  checking DESCRIPTION meta-information ...
─  installing the package to build vignettes
E  creating vignettes (30.7s)
   --- re-building ‘mcmc.Rmd’ using rmarkdown
   Quitting from lines 284-285 (mcmc.Rmd) 
   Error: processing vignette 'mcmc.Rmd' failed with diagnostics:
   callr subprocess failed: jags_file is not an existing file.
   Visit https://books.ropensci.org/targets/debugging.html for debugging advice.
   --- failed re-building ‘mcmc.Rmd’

   --- re-building ‘mcmc_rep.Rmd’ using rmarkdown
   Quitting from lines 354-355 (mcmc_rep.Rmd) 
   Error: processing vignette 'mcmc_rep.Rmd' failed with diagnostics:
   callr subprocess failed: jags_file is not an existing file.
   Visit https://books.ropensci.org/targets/debugging.html for debugging advice.
   --- failed re-building ‘mcmc_rep.Rmd’

   SUMMARY: processing the following files failed:
     ‘mcmc.Rmd’ ‘mcmc_rep.Rmd’

This appears to occur the visNetwork() call in the vignettes (commenting them out allows the vignettes to build). Otherwise all tests pass and I am running JAGS 4.3.0.

wlandau commented 3 years ago

Sorry about that, should be fixed now.

noamross commented 3 years ago

Editor checks:


Thank you will. All tests are running fine now, and your goodpractice::gp() diagnostics are all clear! I am seeking reviewers.

noamross commented 3 years ago

@ropensci-review-bot add @dill to reviewers

ropensci-review-bot commented 3 years ago

@dill added to the reviewers list. Review due date is 2021-05-11. Thanks @dill for accepting to review! Please refer to our reviewer guide.

dill commented 3 years ago

hi @wlandau, I'm just getting started here and in installation I see that currently targets has been archived on CRAN (which I assume you're already aware of) so I have installed the version from ropensci/targets. Is that the most appropriate version to work from?

wlandau commented 3 years ago

Thanks for reviewing, @dill. Development targets (from https://github.com/ropensci/targets) should be fine while we wait for CRAN to review targets version 0.4.2, which I submitted last weekend.

dill commented 3 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

I was using a 2020 MacBook Air (Intel) running macOS 10.15.7 (Catalina). Here is the output of sessionInfo():

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.4

I tried:

remotes::install_github("wlandau/jagstargets", dependencies=TRUE)

but (2021/04/29) targets had been removed from CRAN, so this failed with error:

ERROR: dependencies 'targets', 'tarchetypes' are not available for package 'jagstargets'

So instead I installed with

remotes::install_github("wlandau/targets", dependencies=TRUE)
remotes::install_github("ropensci/tarchetypes", dependencies=TRUE)
remotes::install_github("wlandau/jagstargets", dependencies=TRUE)

This worked fine.

2021-05-06 I tried using install_github() as above again and had no problems.

Estimated hours spent reviewing: 9 (this is high as it includes reading the rOpenSci review documentation)


Review Comments

For some background on where I'm coming from with this review: I am a statistician, but I work primarily with ecologists (who tend to be pretty keen on JAGS and JAGS-like programming), so I have some experience with these kinds of workflows. I don't have experience with targets, so that is new to me. I'm not an RStudio user, so this is based on using R from the command line or within Nvim-R, I doubt this makes any difference but just an FYI.

General comments

I realise this is probably a daunting number of comments. In general jagstargets is an awesome piece of work! Most of my comments are aimed at making the package more accessible for people like me who want to get started using jagstargets. This was a fun package to review, and I learnt a lot. I hope the comments below are useful for you!

README.md

The first two paragraphs give a summary of the package but they feel like they are written for people who already know what targets does (e.g., "factories", "targets", "pipelines" are all jargon to begin with; though I appreciated the links to the blog post, I can see this turning people off right away). Perhaps there could be some adaptation so that someone who is a JAGS user can immediately understand how the package will save them time and headaches. Maybe even a "for JAGS users" and a "for people familiar with targets" section might help? (FWIW, I found the text in ?jagstargets-package to be a bit more transparent.)

Testing

MCMC pipelines vignette

Scaling MCMC pipelines vignette

Trying this out for myself

I tried to construct a comparison between mgcv::gam and mgcv::jagam using jagstargets. You can find the resulting code here. Mostly I found this fairly intuitive once I got my head around the above. tar_visnetwork() was essential to debugging. Feel no obligation to look at this but it might be useful to see what weird things people do when working from scratch and seeing where some bottlenecks are.

Package code

I can't offer too much comment here as the majority of the code is calls to targets which I am not really qualified to comment on in any depth. In general I found the code clear and understandable throughout, even though I don't know the ins and outs of targets.

Issues raised/pull requests

wlandau commented 3 years ago

Thanks for your review, @dill. Your feedback helped clean up the documentation and make it more accessible to new users.

but (2021/04/29) targets had been removed from CRAN, so this failed with error:

targets is back up on CRAN, and I resubmitted tarchetypes on May 4. If all goes well, it may be accepted next week.

covr::package_coverage(type="all") reports: 99.57% coverage

That's odd, https://app.codecov.io/gh/wlandau/jagstargets and my local machines both report 100% coverage. Do you happen to know if any tests were skipped due to missing optional packages?

The first two paragraphs give a summary of the package but they feel like they are written for people who already know what targets does (e.g., "factories", "targets", "pipelines" are all jargon to begin with; though I appreciated the links to the blog post, I can see this turning people off right away). Perhaps there could be some adaptation so that someone who is a JAGS user can immediately understand how the package will save them time and headaches. Maybe even a "for JAGS users" and a "for people familiar with targets" section might help? (FWIW, I found the text in ?jagstargets-package to be a bit more transparent.)

In https://github.com/wlandau/jagstargets/commit/b6d12e3d75361a24feacc74ec6988ee994741936, I rewrote the introductory paragraph and moved the technical details to a later section of the README.

Tests didn't use context() to provide extra information on what exactly was being tested each time. Filenames were semi-informative, but might be better to have more info there for contributors. This might be more useful if tests are expanded further.

testthat::context() is superseded, and I am trying to follow the updated recommendation to use file names as context.

Not much in the way of comments in the test files. Some are self-explanatory but the large tests are harder to follow.

https://github.com/wlandau/jagstargets/commit/6528f1475c1a7fe3151654905e7c65625b050ffa inserts more inline prose to explain the intent of the tests.

Could this be renamed to "Getting started with..." or "An introduction to..." to make it clear this is where to start?

Sure, implemented in https://github.com/wlandau/jagstargets/commit/885cce7f36a29a7203afba97717b41b8cda21780.

It might be nice to have a bit of table setting at the start of this vignette to talk about the kinds of outputs we want, even showing how an existing JAGS workflow works and how that maps into jagstargets (and then you can say how similar this is to the DAG generated by tar_visnetwork? In order to get people on board I think it's necessary to show clear mapping between their current workflow and "doing something fancy" and a bit of intro here might help that happen.

I am not sure I fully understand, but https://github.com/wlandau/jagstargets/commit/63ad892b620ab706188fe855cd5c2ff1de1b8023 does work a little harder to put the targets pipeline in the context of an ordinary JAGS workflow. Please let me know if this different from what you have in mind.

I found the list of targets a little confusing initially (around here). I wasn't sure where I was supposed to be looking for those objects and what was going on. Perhaps running tar_manifest() first, showing the effect of the tar_jags call, then explaining what these entries mean might make that easier to understand?

https://github.com/wlandau/jagstargets/commit/0ef2c23a3c2db1ebf968738d1f98618d443179f4 rewords that section to try to be more helpful.

Does the tar_jags call have to be wrapped in list() in that chunk (if so maybe explain why).

list() is not required in this specific instance, but it underscores that _targets.R should end with a list of targets in the general case. https://github.com/wlandau/jagstargets/commit/e371d0f24eebc1e149d7d9c80c4b712207511fc3 adds a comment to clarify this. (Returning a single target is also acceptable but not very common.)

The last part of the vignette starts talking about a second model y.jags though we don't define that. Is it worth make just a quick note here that tar_visnetwork doesn't check that the model files exist?

tar_jags() (and thus tar_visnetwork()) does error out if the JAGS file does not exist. (The original vignette wrote the model in a hidden code chunk.) https://github.com/wlandau/jagstargets/commit/cbe845e9dc8ca6415c14b238abb66b95e3fc67f1 exposes the new model to the reader.

Might be worth having something up top saying to start with the other vignette first?

Added a mention of the intro vignette.

There could be a little more chat at the start of this vignette explaining that you want to find out about how well model.jags estimates alpha and beta[].

https://github.com/wlandau/jagstargets/commit/c5af606cdd34802b34767ec729b666b4b3ea066c adds a comment near the beginning to clarify a bit about what I mean by "validation".

This is a bit of an odd model (in my opinion at least!) in that you have fixed the covariate values and have beta be a random effect (you generate n random slopes), rather than a fixed beta with random data each time. I wonder if a classic linear model situation (where data vary and there is only one beta) would be a clearer situation for people to think about? It also has the advantage of producing much less output, as well as having the same DGP as the other vignette.

https://github.com/wlandau/jagstargets/commit/c5af606cdd34802b34767ec729b666b4b3ea066c changes the model to have one slope and one intercept. My intention with the different model was to demonstrate that jagstargets can handle non-scalar parameters when it comes to simulation studies that look at coverage in posterior intervals.

With my stats hat on, I'd say write out the equations for the model/DGP, so it's super-clear what's happening here.

Added in https://github.com/wlandau/jagstargets/commit/057c47cc68199992fadd076660f877c6f731cd5b.

There is some setup in the chunks here that isn't explained (e.g., options(crayon.enabled = FALSE), tar_option_set(memory = "transient", garbage_collection = TRUE) here that is worth explaining to novice users like myself. Perhaps this is only necessary for this vignette format, but it would be good to know that if so.

options(crayon.enabled = FALSE) is internal to the vignette. It makes sure printed output is rendered properly in the R Markdown document, and it does not appear in the rendered HTML document. https://github.com/wlandau/jagstargets/commit/4ddaf4444839d01ae0b857cafb65736314fafb09 adds a comment to indicate that the other options help targets behave more efficiently with computer memory. Those options are explained in more detail in ?targets::tar_option_set.

I found the _targets.R definition hard to follow, particularly the arguments to tar_jags_rep_summary() were hard to relate to what is going on... The terminology could be explained here at the same time: "branches", "replications", "batches" and "iterations" all appear almost at once. I think I've figured out what they mean by the end, but it would be nice to have that clear as those terms are and why we use them here. This could even be woven into the first 1-2 intro paragraphs?

Hopefully https://github.com/wlandau/jagstargets/commit/586f75e73d87ccfc513216df89edccae243d3baa clears that up.

The names of the nodes in these graphs are hard-going (this was less obvious above when the model was simpler). model_model vs. model vs. model_lines_model: these make sense once you know what you're doing I'm sure but I found them difficult to keep track of (see comment in "Package code" below). Worth pointing people to the documentation in ?tar_jags to explain the naming convention at this point.

From the documentation of tar_jags()

#'   The target names use the `name` argument as a prefix, and the individual
#'   elements of `jags_files` appear in the suffixes where applicable.
#'   As an example, the specific target objects returned by
#'   `tar_jags(name = x, jags_files = "y.jags", ...)` returns a list
#'   of `targets::tar_target()` objects:

Still takes getting used to though, I know.

Is there any useful information that can be gleaned from the .rep column? It looks like a hash, can it tell us anything re: reproducibility or is it just for internal bookkeeping?

Yes, it is just a hash to distinguish among individual simulation replications. Dynamic branching in targets automatically mashes all the reps together in the same data frame, and having a rep ID helps if the user wants to do custom post-processing within reps, e.g. tar_read(model) %>% group_by(.rep) %>% group_modify(~derive_special_parameter(.x)) %>% ....

I tried to construct a comparison between mgcv::gam and mgcv::jagam using jagstargets. You can find the resulting code here. Mostly I found this fairly intuitive once I got my head around the above. tar_visnetwork() was essential to debugging. Feel no obligation to look at this but it might be useful to see what weird things people do when working from scratch and seeing where some bottlenecks are.

Awesome! Thanks for sharing.

If jagstargets-package is a landing page, perhaps it could contain a link back to the pkgdown site, and perhaps a @seealso to the most useful starting manpage(s) (tar_jags?)

Added in https://github.com/wlandau/jagstargets/commit/240d5cdd72744f32baeba32827cd29dbc60c031e.

tar_jags_example_data.R should explain the data generating process, as well as what the components of the returned object.

https://github.com/wlandau/jagstargets/commit/3a8fb6775e1a1e635846f827664af152b007bef1 adds a @details section to explain the data-generating process (drawing from the prior predictive distribution). tar_jags_example_data() has a @format section to explain the elements of the returned JAGS data list.

Naming scheme for targets: are there more descriptive names for the x_file_y and x_lines_y? I don't have stellar suggestions on replacements but "file" and "lines" feel very generic and like they might get confused in a large workflow? The other labels seem very clear though. Following-on from this, it is possible to combine the x_file_y and x_lines_y targets? I see in the stantargets vignette there doesn't seem to be a "compile" step, just the x_file_y then x_mcmc_y -- is there something that is different here that means we need both steps to be separate?

All this has to do with how jagstargets and stantargets handle high-performance computing. Some users just run models on their local laptops, while others distribute multiple models on computing clusters. On clusters, some users configure jobs to run at the root directory of the project, while others use node-specific storage. And eventually (hopefully) the parallel backends of targets will be able to submit jobs to the cloud, and those environments will not have direct access to the original JAGS model files the users start with.

jagstargets is the simpler case. All we need to do is:

  1. Hash the upstream JAGS model file and watch it for changes.
  2. Make sure parallel workers have access to the JAGS model even if they cannot access the original model file.

I solve (2) by reading the actual lines of text of the JAGS model file in a separate target. A downstream MCMC target writes these lines to a new temporary JAGS model file and runs the temporary file instead of the user's original file. Because of the way file tracking works in targets, (1) and (2) need to be separate targets and cannot be combined into one.

stantargets is more complicated because Stan models require compilation. On systems where parallel workers do not share storage, we need the "lines" trick so the workers have access to the model, and that means each worker needs to compile its own copy of the model. But compilation takes a lot of time, so on systems that let parallel workers share storage efficiently, stantargets should pre-compile each model once ahead of time and then let the workers run in parallel on the pre-compiled models. The compile argument allows the user to choose when to compile models, and this argument changes which targets are created and what they do.

I hope this clarifies the roles of the "file" and "lines" targets. They do need to be separate, and I do think these names accurately describe what the targets do, even if the specifics are esoteric for most users. Happy to consider specific suggestions for alternative names if you have any.

dill commented 3 years ago

Thanks for the changes and explanations @wlandau!

I pulled 3a8fb6775e1a1e635846f827664af152b007bef1 before addressing the points below.

Thanks also for the explanations of the JAGS compilation process vs. Stan. I've got a much better understanding of how and why that works now. I had a think about other names for "file" and "lines" and didn't come up with anything less ambiguous (e.g., "source" but that could be used for either option) that wasn't very ugly (e.g., "jagsfile"/"jagslines"). I guess I'm just thinking about a very specific use case where I might compare similar models fitted in JAGS vs Stan and how I can tell which bit of the model is which but I guess the solution there is naming the previous targets better rather than the template. I'm happy to let that drop!

wlandau commented 3 years ago

coverage: I looked at the report and it says tar_jags_df.R lines 68:70 are not covered (relating to tar_jags_df_summary()), so that doesn't look like an optional package issue?

Thanks, that was a big hint. Hopefully https://github.com/wlandau/jagstargets/commit/58621cbc092a3016dac07784ebbb4fb37797031c has better coverage.

I guess I'm just thinking about a very specific use case where I might compare similar models fitted in JAGS vs Stan and how I can tell which bit of the model is which

Yeah, jagstargets and stantargets remove file extensions when they compute target names. So users who call both in the same pipeline would need to disambiguate the model files in other ways. Not ideal for that specific use case, but targets lets you know pretty quickly when there are name conflicts. On my end, changing the name policy is always a risk because it invalidates everyone's workflows so they have to rerun those targets.

dill commented 3 years ago

Re-running covr::package_coverage(type="all") on wlandau/jagstargets@58621cb gives 100% coverage 🎉

That's helpful extra info on name conflicts, thanks for that!

noamross commented 3 years ago

@ropensci-review-bot add @ernestguevarra to reviewers

ropensci-review-bot commented 3 years ago

@ernestguevarra added to the reviewers list. Review due date is 2021-06-15. Thanks @ernestguevarra for accepting to review! Please refer to our reviewer guide.

noamross commented 3 years ago

@ernestguevarra Your review was due 2021-06-15. Please let us know when you are able to get it in.

ernestguevarra commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

On running the workflow in the worked example in the vignette entitled "Bayesian simulation pipelines with jagstargets" that adds a target called coverage, I see the following errors:

Error: callr subprocess failed: could not find function "%>%"
Error: callr subprocess failed: could not find function "summarize"

In the _targets.R file, I added the following call:

library(dplyr)

to be able to get the updated workflow running.

When I reviewed simulation.Rmd which is the Rmd for this vignette, I noted that in the setup R code chunk, calls to load the following packages were made:

library(R2jags)
library(dplyr)
library(targets)
library(jagstargets)

I am unsure if this is correct thinking on my part but I am guessing because {dplyr} is loaded in the setup code chunk of the vignette that when building the vignette, the specific example is not erroring but when I run the example outside of the Rmd that the workflow errors with the message above? If so, would it be appropriate to add a call to load the package {dplyr} in the _targets.R file example as follows:

# _targets.R
library(targets)
library(jagstargets)
library(dplyr)

generate_data <- function (n = 10L) {
  beta <- stats::rnorm(n = 2, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, beta[1] + x * beta[2], 1)
  # Elements of .join_data get joined on to the .join_data column
  # in the summary output next to the model parameters
  # with the same names.
  .join_data <- list(beta = beta)
  list(n = n, x = x, y = y, .join_data = .join_data)
}

list(
  tar_jags_rep_summary(
    model,
    "model.jags",
    data = generate_data(),
    parameters.to.save = "beta",
    batches = 5, # Number of branch targets.
    reps = 4, # Number of model reps per branch target.
    variables = "beta",
    summaries = list(
      ~posterior::quantile2(.x, probs = c(0.025, 0.975))
    )
  ),
  tar_target(
    coverage,
    model %>%
      group_by(variable) %>%
      summarize(
        coverage = mean(q2.5 < .join_data & .join_data < q97.5),
        .groups = "drop"
      )
  )
)

I am a newbie with using targets and these are common errors that I encounter also and my usual approach will be to make a call within the targets workflow to load the package that is causing the error. Whether this is the correct approach, I am not sure but I am flagging it here as others who would follow the example may face the same errors I saw/faced.

Functionality

On install of {jagstargets} using devtools::install(dependencies = TRUE) from within the project, I see the following error:

Error: object ‘tar_assert_chr’ is not exported by 'namespace:targets'

Warning message:
In i.p(...) :
  installation of package ‘/tmp/RtmpsW4y3O/file615c3eabc513/tarchetypes_0.2.1.9000.tar.gz’ had non-zero exit status

But {tarchetypes} package was installed. I think this is more an issue with {tarchetypes} maybe?

On covr::package_coverage(), I see the following result:

jagstargets Coverage: 14.46%
R/tar_jags_rep.R: 0.00%
R/utils_language.R: 0.00%
R/tar_jags.R: 1.72%
R/tar_jags_rep_summary.R: 2.44%
R/tar_jags_rep_dic.R: 2.63%
R/tar_jags_rep_draws.R: 2.63%
R/tar_jags_df.R: 100.00%
R/tar_jags_example_data.R: 100.00%
R/tar_jags_example_file.R: 100.00%
R/utils_assert.R: 100.00%
R/utils_data.R: 100.00%
R/utils_jags.R: 100.00%
R/utils_logic.R: 100.00%

However, codecov results show 100% coverage across the same functions. I am unsure why this is the case but maybe from new edits to the functions since last code coverage check? I may have contributed to this due to my tardiness in the package review so my sincerest apologies for this.

The package is consistent with rOpenSci packaging guidelines and also equally importantly with the {targets} ecosystem within which it belongs. Given that the {targets} package is also rOpenSci reviewed has helped ensure the thematic and packaging coherence of {jagstargets}. The package name, branding, tone and construct of README and vignettes all are coherent as a standalone package and also consistent within the package ecosystem it inhabits. As a newbie and recent user of {targets}, I found the {targets} story continue well into {jagstargets} particularly the function naming and documentation, the README/website, and the vignettes.

Estimated hours spent reviewing:


Review Comments

Firstly, I would like to apologise for the extreme delay in reviewing this package. I have been severely disorganised with my work and this has impacted on my ability to get to this review promptly.

I thank @wlandau for his work on the {jagstargets} which, from my review, has built on the strength of the {targets} package and facilitates its easy application for Bayesian data analysis using JAGS. I approach this review from the perspective of someone who hasn't formally used JAGS and someone who has only recently learned and currently using {targets}. The author's note in the README of requiring basic knowledge of {targets} and familiarity with Bayesian statistics and JAGS fit me as a new user and because of two really well worked out vignettes on how to get started and on simulation, I was able to go through a quick and easy introduction and orientation to the package and gives me enough foundation to be able to apply the package to appropriate tasks and related work in the future. It also definitely helps that {targets} is well documented with lots of really good examples and has an accompanying manual/handbook reference that I was able to use before to learn how to use it. With the {jagstargets} package being coherently written and documented in relation to the {targets} package, anyone who has already invested time in learning {targets} will find the {jagstargets} extremely easy to learn how to use and immediately handy in implementing a {targets} workflow for JAGS easily and quickly.

I have made a few comments above in the appropriate review topics on some minor issues I encountered when installing the development version of the package, on following one example in the simulation vignette, and on code coverage. In addition, on Good Practice check, I see the following result:

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 14% of code lines are covered by
    test cases.

    R/tar_jags_rep_dic.R:100:NA
    R/tar_jags_rep_dic.R:101:NA
    R/tar_jags_rep_dic.R:102:NA
    R/tar_jags_rep_dic.R:103:NA
    R/tar_jags_rep_dic.R:104:NA
    ... and 557 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
    problems. LaTeX errors found:

The first point above reflects the result I am seeing on code coverage check. For the second point, I am unsure where this is arising from. I do not get any warnings on devtools::check().

I see no spelling issues after running spelling::spell_check_package() and I only see false positive spelling issues when running spelling::spell_check_files("README.Rmd"). These words are already in inst/WORDLIST so these won't get flagged on CRAN checks.

I would also like to mention that there were few and very minor edits that can be made in the vignettes (missing words, punctuation) that don't impact on readability and comprehension of the material but the author might want to fully correct/edit. If so, I would be happy to make those edits into a pull request.

Thank you again to @wlandau for your great work on this package and for the whole {targets} package ecosystem that you have and are continuing to create.

wlandau commented 3 years ago

Thank you for your review, @ernestguevarra.

On running the workflow in the worked example in the vignette entitled "Bayesian simulation pipelines with jagstargets" that adds a target called coverage, I see the following errors...

https://github.com/wlandau/jagstargets/commit/30e4ef60214bcdcb8558e07f1011e4b117305f49 repairs the _targets.R files for that example. The problem has to do with some code duplication that I unfortunately found necessary at the time to embed the example in an R Markdown vignette: there is one code chunk to show _targets.R to the reader and another code chunk to actually write _targets.R.

(Knowing what I know now after implementing Target Markdown, I think I could hack a custom knitr engine that writes a script file instead of evaluating the code, and this should help avoid duplicated code in new documentation I create from scratch for projects in the future.)

On install of {jagstargets} using devtools::install(dependencies = TRUE) from within the project, I see the following error

This is because jagstargets currently relies on some functions exposed in development targets that are not yet on CRAN (see Remotes: ropensci/targets in the jagstargets DESCRIPTION file). (I plan to update targets on CRAN on July 21.) If you run remotes::install_github("ropensci/targets") first, you should be able to install jagsttargets.

On covr::package_coverage(), I see the following result

Some tests use testthat::skip_on_cran() in order to reduce check time and other CRAN-related issues. If you set the NOT_CRAN environment variable to "true", these tests should run and coverage should show as 100%. (See usethis::edit_r_environ().) When I commented out NOT_CRAN=true in my own ~/.Renviron file, I saw the exact coverage pattern you posted.

fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd problems. LaTeX errors found:

Often this particular error has something to do with the local LaTeX installation, e.g. https://stackoverflow.com/questions/10819959/diagnosing-r-package-build-warning-latex-errors-when-creating-pdf-version. I was unable to reproduce it locally.

I would also like to mention that there were few and very minor edits that can be made in the vignettes (missing words, punctuation) that don't impact on readability and comprehension of the material but the author might want to fully correct/edit. If so, I would be happy to make those edits into a pull request.

Happy to accept pull requests about spelling and grammar, or correct any specific spelling/grammar issues you identify.

Please let me know if there is something I missed in my response.

wlandau commented 3 years ago

Hi everyone, it has been a while since I responded to @ernestguevarra's review. Is there something else I forgot to do?

noamross commented 3 years ago

Hi @wlandau, I tend to expect a second response with changes incorporating both reviewers feedback, but I realize your workflow incorporated both responses and all edits. That said, @ernestguevarra and @dill, do @wlandau's changes address your comments?

dill commented 3 years ago

absolutely!

wlandau commented 3 years ago

Thanks, @dill! @ernestguevarra, is there anything else you would like me to work on?

ernestguevarra commented 3 years ago

Hi @wlandau. Apologies. Yes, your comments above has responded to my feedback. Thanks for explaining those issues to me and good to know that you are aware and have actually addressed those even before my comments.

I have nothing else to ask to be worked on. Thank you again for your work on this package!

noamross commented 3 years ago

@ropensci-review-bot approve jagstargets

ropensci-review-bot commented 3 years ago

Approved! Thanks @wlandau for submitting and @dill, @ernestguevarra for your reviews! :grin:

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

wlandau commented 3 years ago

Thanks so much @noamross, @dill, and @ernestguevarra!

I transferred the repo. @noamross, would you grant me repo admin permissions? I seem to have lost them with the transfer.

@ernestguevarra, would you like to be included as a reviewer like @dill? https://github.com/ropensci/jagstargets/blob/e27d1ef3f260a41d005ed81d8b99e2364099400f/DESCRIPTION#L28-L33. Wondering because "Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file" was unchecked in your review.

noamross commented 3 years ago

You should have repo permissions now, @wlandau .

wlandau commented 3 years ago

Thanks Noam!

wlandau commented 3 years ago

Looks like the jagstargets site is up, and I think I completed the other to-dos in this thread. Remaining to-dos on my end are JOSS and then CRAN. In the meantime, @ernestguevarra, feel free to snag that "rev" role whenever you want.