ropensci / ozunconf19

OzUnconf19
http://ozunconf19.ropensci.org/
21 stars 5 forks source link

Validate Tabular Data #9

Closed orchid00 closed 4 years ago

orchid00 commented 4 years ago

I think there might be something out there, but at least I'll find out!

"goodtables is a free, open-source, hosted service for validating tabular data. goodtables checks your data for its structure, and, optionally, its adherence to a specified schema." Ref: https://frictionlessdata.io/docs/validating-data/

The idea emerged out of this: https://github.com/frictionlessdata/goodtables-py "Using the Python goodtables library. This allows you full control over the validation process but requires knowledge of Python."

It would be cool to kind of translate that into a goodtables-R

The closest package I found is https://github.com/data-cleaning/validate but it is not really checking structure like goodtables.

Someone interested to look into this?

orchid00 commented 4 years ago

Also related visdat from Nick Tierney http://visdat.njtierney.com/

stefaniebutland commented 4 years ago

@bzkrouse @wlandau "validation" made me think of pharma industry. Are you aware of any tools? 👆🏼

wlandau commented 4 years ago

Broadly speaking, there's the R Validation Hub: https://www.pharmar.org/. It was discussed at the R/Pharma conference this year.

wlandau commented 4 years ago

As for specific tools, the whole process of TFL-generation is still so SAS-based that I do not have specific recommendations about tools in R.

bzkrouse commented 4 years ago

Thanks for looping us in @stefaniebutland !

I'm not sure if this is exactly what you're after, but a nice tool for comparing 2 datasets is the comparedf function from arsenal package.

This previous unconf thread also came to mind (I remembered it from some previous research I had done).