vincentarelbundock / Rdatasets

A collection of datasets originally distributed in R packages
https://vincentarelbundock.github.io/Rdatasets
Other
323 stars 435 forks source link

add datasets with errors #19

Closed markvanderloo closed 3 years ago

markvanderloo commented 3 years ago

I'll be a bit blunt here ;-).

The problem with all example datasets is that they are perfect: no missings, no outliers, no errors, no inconsistencies.

So no exercises in cleaning, which is what you do most of your time when you work with data.

Please add the SBS2000 data set from the 'validate' package to have something truly horrible in this list, and even people learning R for the first time can suffer from the terrible state that most data is in :-).

Cheers. Mark

vincentarelbundock commented 3 years ago

Thanks for the suggestion! Added here: https://github.com/vincentarelbundock/Rdatasets/commit/819ec7d12036945cca5613af730a7345e4bd52a0

BTW, I've been on the lookout for projects where I could use validate and tinytest. Very cool stuff.