Open zkamvar opened 4 days ago
Yes! This is definitely a gap in the docs I've not had time to tackle! Would be great if you wanted to work on this.
I was envisioning an "Anatomy of a custom validation function" vignette that includes a discussion of how to use capture_check_cnd()
and capture_check_info()
.
I also was thinking of building functionality for creating template custom validation functions (a bit like snippets / usethis::use_r()
) to start users off down the right path. Something like use_custom_check()
?
A few general suggestions:
check
for the object that captures whether validation passes TRUE
or fails FALSE
(i.e. in your example instead of res
). It's always good to set a convention that we all follow but it also means that should custom functions written by users end up useful for the wider community, they will be easier to incorporate into the hubValidations
package. In general it's worth having a look at source code for the check_*
functions and conveying any patterns and best practices in the vignette.check_*()
functions names. This is again not necessary for custom functions to be run but is informative and would again simplify incorporating user custom functions into our package at a later date, so would probably mention it as best practice.src
is not the right place for R functions, more for workflow scripts. These should be placed in the R
directory which with a bit more tweaking (i.e. including a DESCRIPTION
file) would allow for testing functions as well as potentially managing additional dependencies required for custom functions (see #22). While folks are free to use whatever location they want, I would have a very strong preference for an R/
directory to be used by default in examples and as the default location for scripts created by use_custom_check()
.renv
is used for package management in a hub, then mentioning this in relation to managing additional dependencies would also probably be a good idea.THANK YOU! The naming conventions are definitely important to highlight!
Given that the impetus for this is https://github.com/reichlab/variant-nowcast-hub/issues/55, and it needs to be done relatively soon, I think we should restrict the guidance for the moment to only using packages that hubValidations depends on (e.g. dplyr). Dependency resolution on CI is a huge can of worms (especially if {renv} is involved).
I wonder if going through the anatomy of a function like check_tbl_value_col_ascending()
would be a good exercise for an example? Looking at that gives me a good idea of how to construct the validations.
I still feel src is not the right place for R functions, more for workflow scripts. These should be placed in the R directory which with a bit more tweaking (i.e. including a DESCRIPTION file) would allow for testing functions as well as potentially managing additional dependencies required for custom functions (see https://github.com/hubverse-org/hubValidations/issues/22). While folks are free to use whatever location they want, I would have a very strong preference for an R/ directory to be used by default in examples and as the default location for scripts created by use_custom_check().
I can get behind that! (For my own reference, the initial proposal for the R/
folder is in https://github.com/orgs/hubverse-org/discussions/25#discussioncomment-10167652)
Could you clarify one thing: When you want validations to be in the R/
folder, how are you thinking about the possibility of including a DESCRIPTION
file and tests?
Are you thinking a structure like (A) where DESCRIPTION
would live in the root of the hub:
ROOT/
├─R/
│ └─check-thing.R
├─DESCRIPTION
├─tests/
│ └─testthat/
│ └─test-check-thing.R
└─ hub-config/
or like (B) where DESCRIPTION
lives inside the R/
directory:
ROOT/
├─R/
│ ├─R/
│ │ └─check-thing.R
│ ├─DESCRIPTION
│ └─tests/
│ └─testthat/
│ └─test-check-thing.R
└─ hub-config/
In either case, would you be opposed to having validation scripts live in src/validations/R/
by default (C)?
ROOT/
├─src/
│ └─validations/
│ ├─R/
│ │ └─check-thing.R
│ ├─DESCRIPTION
│ └─tests/
│ └─testthat/
│ └─test-check-thing.R
└─ hub-config/
This way, it isolates the validation code from the workflow scripts AND allows a package to be built around it (e.g. src/validations/DESCRIPTION
and src/validations/tests/testthat/
)
Ooooh, interesting. I feel your suggestion of storing any functional R code neatly away in it's own directory is the neatest option.
Given the src
directory is something we seem to be committed to as best practice, I think I'm going to vote for:
ROOT/
├─src/
│ └─validations/
│ ├─R/
│ │ └─check-thing.R
│ ├─DESCRIPTION
│ └─tests/
│ └─testthat/
│ └─test-check-thing.R
└─ hub-config/
as the default structure.
I think this would also provide some option for installing additional dependencies required by validation functions by adding them as imports to src/validations/DESCRIPTION
using pak::local_install("src/validations")
in the CI and if using renv
via the package cellar. Obviously the renv
approach is more of a mission but only affects anyone that does need additional packages AND is using renv
. While it might be rare, it is something that eventually needs supporting.
In any case, I think the use of a DESCRIPTION
file and pak::local_install("src/validations")
as an initial recommendation does not seem to much to discuss (although I'll do it separately as part of resolving #22)
Also, random question which may also need to be touched upon here, what are your thoughts on #20 ?
I was testing out a potential custom validation function for https://github.com/reichlab/variant-nowcast-hub using the vignette in https://hubverse-org.github.io/hubValidations/articles/custom-functions.html#managing-dependencies-of-custom-sourced-functions
I am really glad to report that once I figured out the nuances of actually calling the new function, adding a custom function seems to be pretty plug-and-play!
Here were the steps (with missing pieces labelled):
validations.yml
file (NOTE: this can not bevalidations.yaml
unless you want to manually modify your GitHub workflow)src/
that contains functions for validation, which contains the arguments specified in the default configuration (even if you don't use them)TRUE
orFALSE
and then wrap the result inhubValidations::capture_check_cnd()
hubValidations::validate_submission()
(missing: in the same directory of the hub)I tested this with the simple hub and the functions I created were:
I would be happy to tackle this and update the vignette.