Hi folks,
I started a package recently called testdat, a play on @hadley's testthat. Since most people in the sciences really just work with small datasets, small messy datasets, especially ones read back from old databases and legacy software. I figured it might be a useful idea to have a small test suite for tabular data to make sure of things like:
No UTF-8 characters in some fields
Duplicate idenfiers beause of padding issues
issues with dates
Issues with numeric values being read as characters.
Issues with dates
Some diagnostics on the data themselves (perhaps a simply shiny/open cpu interface to highlight outliers (due to errors in data input) and other things to be aware of)
Then people can load up data, run testdat(data) and have the local browser render a diagnostic report. It's an issue many researchers have told me about and I thought it might be a useful lightweight tool. I'd love for anyone to join in on this or offer thoughts/alternatives.
Hi folks, I started a package recently called
testdat
, a play on @hadley's testthat. Since most people in the sciences really just work with small datasets, small messy datasets, especially ones read back from old databases and legacy software. I figured it might be a useful idea to have a small test suite for tabular data to make sure of things like:Then people can load up data, run
testdat(data)
and have the local browser render a diagnostic report. It's an issue many researchers have told me about and I thought it might be a useful lightweight tool. I'd love for anyone to join in on this or offer thoughts/alternatives.