Open karthik opened 10 years ago
I'm about to turn that Gapminder excerpt into a proper R package and document the (unholy) cleaning it went through. My grad course is providing serious deadlines for both the cleaning and package-ization, so I have no way to really back out or delay indefinitely on this.
I think it would be it's own small data package (?).
Or are you proposing a meta-package holding multiple datasets?
I'm about to turn that Gapminder excerpt into a proper R package and document the (unholy) cleaning it went through.
That would be fantastic.
Or are you proposing a meta-package holding multiple datasets?
I would love for gapminder to be its own data package. In this situation I thought we could compile and document (as in write clear documentation) on a bunch of different but useful datasets that one could simply install and be able to work with. Many courses need this. One e.g. is John Myles White's Rdatasets (standard R datasets as a Julia package: https://github.com/johnmyleswhite/RDatasets.jl)
Jenny: If you have ideas for datasets please suggest here.
Note: We can easily add the gapminder data package as a dependency to this one.
This is not (yet) an R data package, but I really like this Lord of the Rings Data:
https://github.com/jennybc/lotr
originally from here:
Awesome! lotr dataset looks great.
The following came from Kyle Cranmer
Here is a nice list: http://rs.io/100-interesting-data-sets-for-statistics/
CERN is about to release some open data related to the LHC, but the portal is not quite ready: http://opendata.cern.ch
All the best,
Kyle
The basic gapminder
R package now exists:
https://github.com/jennybc/gapminder
I still want to make the cleaning code into compiled notebooks but commented code is already there, along with all the intermediates.
Here is where my grad class STAT 545 has been collecting links to interesting datasets or lists thereof:
Although this isn't quite a lesson, it's often a challenge to teach with interesting data. The widely used
iris
dataset (packaged with R), or diamonds, packaged withggplot2
aren't that interesting. @jennybc brought the gap minder dataset into the SWC material and that has been fantastic.So this might be a bit meta, but some folks could spend a bit of time compiling datasets that are both interesting and fun to munge in {language-of-choice}.