reconhub / linelist

An R package to import, clean, and store case data
https://www.repidemicsconsortium.org/linelist
Other
25 stars 5 forks source link

new feature: data validation rules #81

Open thibautjombart opened 5 years ago

thibautjombart commented 5 years ago

This may be implemented in other packages, so maybe just a wrapper or documentation matter. The idea is to create validation rules for a given data.frame "a la" testthat.

Specific use cases (examples):

Really it seems to all boil down to:

I suspect we can use testthat as a backend, with an interface similar to the clean_spelling, e.g.

Ideally validation rules could be provided in a table outside R e.g. in an excel spreadsheet, like we did for the cleaning rules in clean_spelling.

zkamvar commented 5 years ago

The assertr package is very good for this

zkamvar commented 5 years ago

In fact, I have used this for my own analysis: https://github.com/everhartlab/sclerotinia-366/blob/master/results/data-comparison.md

thibautjombart commented 5 years ago

Looks great indeed. Maybe still useful to build a wrapper around it? Being able to specify rules as a separate file would be cool - proved tremendously useful for dictionary-based data cleaning. Thoughts?

zkamvar commented 5 years ago

I'll see what I can template.

thibautjombart commented 5 years ago

I'll see what I can template.

I'll see what you contemplate.