metrumresearchgroup / mrgvalprep

Helpers for prepping inputs to mrgvalidate
https://metrumresearchgroup.github.io/mrgvalprep
Other
1 stars 0 forks source link

validate_r_package() helper #48

Open seth127 opened 2 years ago

seth127 commented 2 years ago

Given new changes in mrgvalidate 2.0.0, it will be nice to have a helper function that you can point at an R package and it will extract all of the relevant information from the DESCRIPTION, NEWS.md, tests/testthat/ dir, etc. and feed them to mrgvalidate::create_package_docs().

Considerations

This issue needs more detail before we start on it, but here are some thoughts off the top:

seth127 commented 1 year ago

Work-in-progress Requirements:

seth127 commented 1 year ago

I started on implementing this and I'm looking for some feedback here @kyleam @barrettk. (Let's keep it on this issue instead of the PRs for now, saving PR comments for actual code review, if it comes to that.)

Summary

I implemented a helper extract_r_package_info() that does the first five bullets from the previous comment, but doesn't run the tests or create the docs. There is some added complication to those steps, discussed below. I also went ahead and created validate_r_package() in a separate branch, which does everything listed in the previous comment. I'm not sure if that's a good idea or not though.

extract_r_package_info()

Draft of this in #49.

I think this is pretty good as is, though I'm certainly open to feedback on both design and implementation. TL;DR you pass it a path to a local checkout of the package repo and it does the following and gives you the results a named list:

It gets rid of a fair amount of annoying boilerplate code that would need to be added to a script like this. That said, when I tried to refactor that script to use this helper, it was still a bit long and annoying:

`build-validation-docs.R` ```r # set up directories and clear existing output dirs, if they exist val_dir <- system.file("validation", package = "bbr") print(val_dir) test_dir <- file.path(val_dir, "test_results") if (fs::dir_exists(test_dir)) fs::dir_delete(test_dir) fs::dir_create(test_dir) # get package info args <- extract_r_package_info(rprojroot::find_rstudio_root_file()) args["style_dir"] <- "~/Documents/docx-ref-header-image/" args["auto_test_dir"] <- test_dir # run tests and write res to disk test_res <- mrgvalprep::parse_testthat_list_reporter( devtools::test(Reporter = testthat::ListReporter), roll_up_ids = TRUE ) write.csv( test_res, file.path(test_dir, paste0(args$product_name, "-tests.csv")) ) # capture commit hash and other system info git_hash <- system("git rev-parse HEAD", intern=TRUE) Sys.setenv("COMMIT_HASH" = git_hash) mrgvalprep::get_sys_info( out_path = file.path(test_dir, paste0(args$product_name, "-tests.json")), env_vars = c("METWORX_VERSION", "COMMIT_HASH") ) # create docs docs_dir <- file.path(val_dir, paste0(args$product_name, "-", args$version, "-validation-docs")) if (fs::dir_exists(docs_dir)) fs::dir_delete(docs_dir) fs::dir_create(docs_dir) args["output_dir"] <- docs_dir do.call( mrgvalidate::create_package_docs, args ) ```

Which brings us to...

validate_r_package()

A draft of this is in #50. I have very mixed feelings about this function. On the bright side, it makes that entire script above turn into:

mrgvalprep::validate_r_package(
  here::here(), 
  style_dir = "~/Documents/docx-ref-header-image/"
)

So that's pretty slick. Here are several concerns though:

Thanks in advance for the feedback here. Again, let's keep the discussion on this issue (instead of the PRs) until we decide on a path forward.

kyleam commented 1 year ago

I was still hopeful that we could use the r-dev-ci-mrgval image (defined in our r-snapshots/docker-r-dev-ci GHE repo) for most packages (example scratch build with old mrgvalidate). Aside from of course needing an update for mrgvalidate 2.0, that image currently handles only packages whose tests run completely on the r-dev-ci-mpn images (so notably not bbr), but IIRC the idea was that, for these non-Drone cases (hopefully only a few), we could adjust it to take a specified auto_test directory so that the manual test runner could feed that as input to a docker run ... [1].

[edit: "manual test runner" is bad word choice in this context. I mean the human that's running the automatic tests.]

In that context, I think the value of a full end-to-end runner like validate_r_package goes down. The most valuable helper would be one for generating the test results directory because that's what packages like bbr need to handle outside of the image. It's still worth considering whether some other helpers (things like parse_latest_release) should live mrgvalprep (where they can be nicely documented and tested), even if the r-dev-ci-mrgval image ends up being their only user, but then these helpers should be designed with r-dev-ci-mrgval in mind.

So, I suppose my thoughts on this issue depend on whether you see r-dev-ci-mrgval as the primary way of generating the docs for most packages.

r-dev-ci-mrgval help output ``` $ docker run --rm -it 906087756158.dkr.ecr.us-east-1.amazonaws.com/r-dev-ci-mrgval -h Create validation docs for package in current directory Usage: create-validation-docs.R [options] [--reqs FILE]... [--stories FILE]... See https://metrumresearchgroup.github.io/mrgvalprep/reference/read_spec_yaml.html for more details on the format of the story and requirement files. Options: -r FILE, --reqs=FILE YAML file defining requirements. If not given, the stories should be linked directly to the tests. -s FILE, --stories=FILE YAML file defining stories. If no files are given, ./inst/validation/stories.yaml is used. --glob Expand --reqs and --stories values with Sys.glob(). --docs-dir=DIRECTORY Put generated validation in DIRECTORY instead of ./inst/validation/docs/{version}/ --results-dir=DIRECTORY Dump the test results to DIRECTORY instead of ./inst/validation/test-results/ -h, --help Show this output and exit ```

[1] To run these images on Metworx, we need to be able to pull from ECR. It doesn't look like that's accessible from workflows by default. So, that'd need to be looked into. (Accessing ECR from laptop while on VPN works fine.)

kyleam commented 1 year ago

for these non-Drone cases (hopefully only a few), we could adjust it to take a specified auto_test directory so that the manual test runner could feed that as input to a docker run ... [1].

A half-baked idea that gets around the need for the manual tester to call docker run at all: have a helper script that injects the untracked test results into a disconnected branch (similar to gh-pages) and pushes that. On push of that branch, Drone could be triggered to build the val docs and publish them like it does for the standard case. That approach would mean the the test results (including the tests.json) would be public, but I think we'd be okay with that.

seth127 commented 1 year ago

@kyleam this is all a very good point. I do still think that running in drone (on tag, similar to how we build and publish) is probably the best scenario. And, in that case, I think you're right that the only actually useful thing here, even in extract_r_package_info(), is parse_latest_release().

A half-baked idea... have a helper script that injects the untracked test results into a disconnected branch (similar to gh-pages) and pushes that.

I like this idea. I think we found that there are actually a non-trivial number of cases where we need to run (some subset of) the tests on Metworx. I remember a few of the plotting packages... probably mrgsolve...

Let's remember to talk about this at some point soon: basically what would be the interface for a user running the tests locally (on Metworx) and then pushing them. Some questions:

kyleam commented 1 year ago

@seth127:

basically what would be the interface for a user running the tests locally (on Metworx) and then pushing them. Some questions:

Good questions. Taking the second one first:

Could this be an mrgvalprep function? Would that be better or worse than a boilerplate .sh script that gets checked in to various package repos?

Hmm, I hadn't thought much yet about the actual location of this logic. (I think I said "helper script" just because I had subdir-to-gh-pages in my head as an example of the "import to subtree on other ref" functionality.) I'd say mrgvalprep makes sense because the test runner would already be using it for running and writing the tests results. We could extract the logic you have in validate_r_package to a "run and write tests" function, and then have a wrapper on top of that that writes the results to a Git reference and pushes (more on that next).

Would the tests need to be run on the same commit hash that is being tagged/validated? That seems obnoxious and brittle, but if not... how do we know that we tested the same thing that we're validating?

Yes, I think there ought to be an automatic link between the tester's commit (or tree, really) and the results that Drone uses. Here's a sketch:

kyleam commented 1 year ago

A draft implementation of the git ref idea is at gh-52.