Open thibautjombart opened 5 years ago
In fact, I have used this for my own analysis: https://github.com/everhartlab/sclerotinia-366/blob/master/results/data-comparison.md
Looks great indeed. Maybe still useful to build a wrapper around it? Being able to specify rules as a separate file would be cool - proved tremendously useful for dictionary-based data cleaning. Thoughts?
I'll see what I can template.
I'll see what I can template.
I'll see what you contemplate.
This may be implemented in other packages, so maybe just a wrapper or documentation matter. The idea is to create validation rules for a given
data.frame
"a la"testthat
.Specific use cases (examples):
xxx
is aDate
and should be greater or less than a given datexxx - yyy
must be less than a given number (e.g. delays fromyyy
toxxx
) must be less than 30 dayssex
should be eithermale
,female
, orunknown
age
should be strictly positive, less than 150xxx
should be of specific classReally it seems to all boil down to:
xxx
must fulfill a logical condition, e.g.xxx < whatever
,xxx %in% something
xxx
andyyy
must fulfill a logical condition, e.g.xxx > yyy
orxxx - yyy > something
I suspect we can use
testthat
as a backend, with an interface similar to theclean_spelling
, e.g.validate_variable(x, rule)
: validates a single variablevalidate_data(x, rules = list (variable_xxx = rule_xxx, variable_yyy = rule_yyy))
: appliesvalidate_variable
to a bunch of variablesIdeally validation rules could be provided in a table outside R e.g. in an excel spreadsheet, like we did for the cleaning rules in
clean_spelling
.