vincentarelbundock / Rdatasets

A collection of datasets originally distributed in R packages
https://vincentarelbundock.github.io/Rdatasets
Other
323 stars 435 forks source link

nycflights13 #5

Closed sebastiansauer closed 3 years ago

sebastiansauer commented 3 years ago

Hi Vincent,

nycflights13 is a quite widely used dataset for predictive modelling (quite largish though).

Here's the source: https://www.rdocumentation.org/packages/nycflights13/versions/1.0.1

Would you consider including it?

Cheers Sebastian

vincentarelbundock commented 3 years ago

My understanding was that this package includes minimal data (e.g., plane and airport info), but that the actually useful data is downloaded from the web via functions, because data is too large to package and it gets updated frequently. If that's correct, then I'm not sure it really fits RDatasets, which stores everything on Github (with corresponding size limits), and is thus better suited for "static" data.

Let me know what you think.

sebastiansauer commented 3 years ago

If your point is correct, then I'd completeley agree. However, I think you are referring to a different data set. the nycflights13 data set only includes the flights of 2013, and is thus a fait accompli. Looking at the time stamps of the package, it appears that the last change was commited in 2019: https://www.rdocumentation.org/packages/nycflights13/versions/1.0.1

The package itself appears to be hosted on Github: https://github.com/hadley/nycflights13

Downloading/installing the package shows ~30 MB size, so it seems that all data is included in the package, and the data is not constructed as per single requests.

In sum, I'd say it's save to include.

Thanks for this awesome project!

vincentarelbundock commented 3 years ago

Thanks for the investigation!

Done.