Given new changes in mrgvalidate 2.0.0, it will be nice to have a helper function that you can point at an R package and it will extract all of the relevant information from the DESCRIPTION, NEWS.md, tests/testthat/ dir, etc. and feed them to mrgvalidate::create_package_docs().

Considerations

This issue needs more detail before we start on it, but here are some thoughts off the top:

Should it actually make the mrgvalidate::create_package_docs() call? I think so, but we may need to optionally be able to pass through a few things (e.g. see next bullet).
Should it run the test suite automatically, parse the results, and write to csv? I think this should probably be optional, because in many cases it's desirable, but in other cases we want to pass through a test output directory of tests run somewhere else (e.g. for building docs in CI that needs test results that were run on Metworx).

Work-in-progress Requirements:

Parse package name from DESCRIPTION file
Parse package version from DESCRIPTION file
Parse remote URL from repo
Parse stories and requirements from inst/validation and load to spec object
Parse release notes from entry in NEWS.md matching name and version
Tests, either:
- Pass a directory of tests that will be run, parsed, and written to csv and system info to json
- Use mrgvalprep::get_sys_info(), passing COMMIT_HASH and (if present) METWORX_VERSION
- Pass an auto_test_dir with csv and json for previously run tests
Pass path to style_dir, but default to NULL.
Be able to pass through output dir, but create subdir with "{name}-{version}-validation-docs"

I started on implementing this and I'm looking for some feedback here @kyleam @barrettk. (Let's keep it on this issue instead of the PRs for now, saving PR comments for actual code review, if it comes to that.)

Summary

I implemented a helper extract_r_package_info() that does the first five bullets from the previous comment, but doesn't run the tests or create the docs. There is some added complication to those steps, discussed below. I also went ahead and created validate_r_package() in a separate branch, which does everything listed in the previous comment. I'm not sure if that's a good idea or not though.

`extract_r_package_info()`

Draft of this in #49.

I think this is pretty good as is, though I'm certainly open to feedback on both design and implementation. TL;DR you pass it a path to a local checkout of the package repo and it does the following and gives you the results a named list:

Parse package name from DESCRIPTION file
Parse package version from DESCRIPTION file
Parse remote URL from repo
Parse stories and requirements from inst/validation and load to spec object
Parse release notes from entry in NEWS.md matching name and version

It gets rid of a fair amount of annoying boilerplate code that would need to be added to a script like this. That said, when I tried to refactor that script to use this helper, it was still a bit long and annoying:

`build-validation-docs.R`

```r # set up directories and clear existing output dirs, if they exist val_dir <- system.file("validation", package = "bbr") print(val_dir) test_dir <- file.path(val_dir, "test_results") if (fs::dir_exists(test_dir)) fs::dir_delete(test_dir) fs::dir_create(test_dir) # get package info args <- extract_r_package_info(rprojroot::find_rstudio_root_file()) args["style_dir"] <- "~/Documents/docx-ref-header-image/" args["auto_test_dir"] <- test_dir # run tests and write res to disk test_res <- mrgvalprep::parse_testthat_list_reporter( devtools::test(Reporter = testthat::ListReporter), roll_up_ids = TRUE ) write.csv( test_res, file.path(test_dir, paste0(args$product_name, "-tests.csv")) ) # capture commit hash and other system info git_hash <- system("git rev-parse HEAD", intern=TRUE) Sys.setenv("COMMIT_HASH" = git_hash) mrgvalprep::get_sys_info( out_path = file.path(test_dir, paste0(args$product_name, "-tests.json")), env_vars = c("METWORX_VERSION", "COMMIT_HASH") ) # create docs docs_dir <- file.path(val_dir, paste0(args$product_name, "-", args$version, "-validation-docs")) if (fs::dir_exists(docs_dir)) fs::dir_delete(docs_dir) fs::dir_create(docs_dir) args["output_dir"] <- docs_dir do.call( mrgvalidate::create_package_docs, args ) ```

Which brings us to...

`validate_r_package()`

A draft of this is in #50. I have very mixed feelings about this function. On the bright side, it makes that entire script above turn into:

mrgvalprep::validate_r_package(
  here::here(), 
  style_dir = "~/Documents/docx-ref-header-image/"
)

So that's pretty slick. Here are several concerns though:

I think, if we have this, it should maybe live in mrgvalidate instead of mrgvalprep
- This uses mrgvalidate::create_package_docs() (which is only present in 2.0.0) and actually creates validation docs.
- Actually creating the docs is really out-of-scope for mrgvalprep.
- mrgvalidate is already in Suggests for mrgvalprep, but only for used in vignettes. This is the first time it's called in an actual function.
Running the tests gets a little complicated...
- We had a previous function run_tests() that did a bunch of gymnastics to control the test environment, but this was in large part because it was cloning a repo and installing it (along with dependencies) to a temp dir.
- I think the way I have it implemented here seems to work, and it's certainly much simpler. It does rely on validate_r_package() being called from a session with all of the relevant package's dependencies available (i.e. on .libPaths()) but I think that's a fine assumption/restriction. This will pretty much only be called from within the package project or in Drone, and either case should be fine.
Should running the tests be optional (i.e. be able to point to a dir of pre-run test results instead)? If so, what's the interface for this? I can't think of a non-awkward way to have tests_dir as an arg with a default value and also let you pass through auto_test_dir with pre-run test results.
Is passing ... through to mrgvalidate::create_package_docs() the right move? Maybe, but we should probably at least check them.

Thanks in advance for the feedback here. Again, let's keep the discussion on this issue (instead of the PRs) until we decide on a path forward.

I was still hopeful that we could use the r-dev-ci-mrgval image (defined in our r-snapshots/docker-r-dev-ci GHE repo) for most packages (example scratch build with old mrgvalidate). Aside from of course needing an update for mrgvalidate 2.0, that image currently handles only packages whose tests run completely on the r-dev-ci-mpn images (so notably not bbr), but IIRC the idea was that, for these non-Drone cases (hopefully only a few), we could adjust it to take a specified auto_test directory so that the manual test runner could feed that as input to a docker run ... [1].

[edit: "manual test runner" is bad word choice in this context. I mean the human that's running the automatic tests.]

In that context, I think the value of a full end-to-end runner like validate_r_package goes down. The most valuable helper would be one for generating the test results directory because that's what packages like bbr need to handle outside of the image. It's still worth considering whether some other helpers (things like parse_latest_release) should live mrgvalprep (where they can be nicely documented and tested), even if the r-dev-ci-mrgval image ends up being their only user, but then these helpers should be designed with r-dev-ci-mrgval in mind.

So, I suppose my thoughts on this issue depend on whether you see r-dev-ci-mrgval as the primary way of generating the docs for most packages.

r-dev-ci-mrgval help output

``` $ docker run --rm -it 906087756158.dkr.ecr.us-east-1.amazonaws.com/r-dev-ci-mrgval -h Create validation docs for package in current directory Usage: create-validation-docs.R [options] [--reqs FILE]... [--stories FILE]... See https://metrumresearchgroup.github.io/mrgvalprep/reference/read_spec_yaml.html for more details on the format of the story and requirement files. Options: -r FILE, --reqs=FILE YAML file defining requirements. If not given, the stories should be linked directly to the tests. -s FILE, --stories=FILE YAML file defining stories. If no files are given, ./inst/validation/stories.yaml is used. --glob Expand --reqs and --stories values with Sys.glob(). --docs-dir=DIRECTORY Put generated validation in DIRECTORY instead of ./inst/validation/docs/{version}/ --results-dir=DIRECTORY Dump the test results to DIRECTORY instead of ./inst/validation/test-results/ -h, --help Show this output and exit ```

[1] To run these images on Metworx, we need to be able to pull from ECR. It doesn't look like that's accessible from workflows by default. So, that'd need to be looked into. (Accessing ECR from laptop while on VPN works fine.)

for these non-Drone cases (hopefully only a few), we could adjust it to take a specified auto_test directory so that the manual test runner could feed that as input to a docker run ... [1].

A half-baked idea that gets around the need for the manual tester to call docker run at all: have a helper script that injects the untracked test results into a disconnected branch (similar to gh-pages) and pushes that. On push of that branch, Drone could be triggered to build the val docs and publish them like it does for the standard case. That approach would mean the the test results (including the tests.json) would be public, but I think we'd be okay with that.

@kyleam this is all a very good point. I do still think that running in drone (on tag, similar to how we build and publish) is probably the best scenario. And, in that case, I think you're right that the only actually useful thing here, even in extract_r_package_info(), is parse_latest_release().

A half-baked idea... have a helper script that injects the untracked test results into a disconnected branch (similar to gh-pages) and pushes that.

I like this idea. I think we found that there are actually a non-trivial number of cases where we need to run (some subset of) the tests on Metworx. I remember a few of the plotting packages... probably mrgsolve...

Let's remember to talk about this at some point soon: basically what would be the interface for a user running the tests locally (on Metworx) and then pushing them. Some questions:

Would the tests need to be run on the same commit hash that is being tagged/validated? That seems obnoxious and brittle, but if not... how do we know that we tested the same thing that we're validating?
Could this be an mrgvalprep function? Would that be better or worse than a boilerplate .sh script that gets checked in to various package repos?
- those are the only two I can think of right now, but I'll add more here if I think of any

@seth127:

basically what would be the interface for a user running the tests locally (on Metworx) and then pushing them. Some questions:

Good questions. Taking the second one first:

Could this be an mrgvalprep function? Would that be better or worse than a boilerplate .sh script that gets checked in to various package repos?

Hmm, I hadn't thought much yet about the actual location of this logic. (I think I said "helper script" just because I had subdir-to-gh-pages in my head as an example of the "import to subtree on other ref" functionality.) I'd say mrgvalprep makes sense because the test runner would already be using it for running and writing the tests results. We could extract the logic you have in validate_r_package to a "run and write tests" function, and then have a wrapper on top of that that writes the results to a Git reference and pushes (more on that next).

Would the tests need to be run on the same commit hash that is being tagged/validated? That seems obnoxious and brittle, but if not... how do we know that we tested the same thing that we're validating?

Yes, I think there ought to be an automatic link between the tester's commit (or tree, really) and the results that Drone uses. Here's a sketch:

human is ready to tag a commit for a release and runs mrgvalprep::run_tests_and_push_results() (needs better name).
- This function should be strict and abort if there are any untracked changes.
- Assuming the tests pass, that injects the results and info into a commit on the refs/mrgval/test-results reference. (Note that this is outside the standard branch namespace; I can expand on why I think that is appealing in this scenario.)
- With that reference, the results are put into a subdirectory named by the tree ID (likely actually a pair of subdirectories: {treeid[0:2]}/{treeid[2:]}).
  
  Note that using the tree ID focuses on what we actually care about. If human tests on the top of the release branch but then ends up tagging a merge commit that has the exactly same tree (i.e. it was a fast-forward merge), that's fine.
- The function handles pushing that ref. After incorporating any remote changes, the only possible conflict is from that exact tree having results already, so that should be propagated that up to the caller.
human creates and pushes the release (or dev release) tag
The repo's Drone build has a tag-triggered pipeline.
- One step derives the tree ID (from dereferencing the tag/commit), fetches refs/mrgval/test-results, and extracts the test results for the tree ID. If the results aren't present but the tag is a dev release, there could be some logic to exit the pipeline without failure.
  
  I'm presenting this as a separate step conceptually, but this logic should probably be within r-dev-ci-mrgval (not necessarilly as part of the main script), so that different repos don't need to duplicate non-trivial logic in their .drone.ymls.
- The next step runs r-dev-ci-mrgval, feeding it the test results extracted in the previous step.
- The final step publishes the docs.
- Note that, when comparing this to the Drone pipeline for the fully automatic val doc generation, only the first step differs.

A draft implementation of the git ref idea is at gh-52.

metrumresearchgroup / mrgvalprep

validate_r_package() helper #48

Considerations

Summary

`extract_r_package_info()`

`validate_r_package()`