tidymodels / modeldata

Data Sets Used by tidymodels Packages
https://modeldata.tidymodels.org/
Other
22 stars 5 forks source link

Can't access to data sets using `::` #44

Closed jbkunst closed 2 years ago

jbkunst commented 2 years ago

Hi,

Thanks for this package and all the work put in tmwr!

I noted I can't access a data set using ::. There is a particular reason why datasets cannot be accessed using this way like other data packages?

library(modeldata)

modeldata::ames
#> Error: 'ames' is not an exported object from 'namespace:modeldata'
modeldata::credit_data
#> Error: 'credit_data' is not an exported object from 'namespace:modeldata'

# example 1
library(babynames)
babynames::babynames
#> # A tibble: 1,924,665 × 5
#>     year sex   name          n   prop
#>    <dbl> <chr> <chr>     <int>  <dbl>
#>  1  1880 F     Mary       7065 0.0724
#>  2  1880 F     Anna       2604 0.0267
#>  3  1880 F     Emma       2003 0.0205
#>  4  1880 F     Elizabeth  1939 0.0199
#>  5  1880 F     Minnie     1746 0.0179
#>  6  1880 F     Margaret   1578 0.0162
#>  7  1880 F     Ida        1472 0.0151
#>  8  1880 F     Alice      1414 0.0145
#>  9  1880 F     Bertha     1320 0.0135
#> 10  1880 F     Sarah      1288 0.0132
#> # … with 1,924,655 more rows

# example 2
library(nycflights13)
nycflights13::airlines
#> # A tibble: 16 × 2
#>    carrier name                       
#>    <chr>   <chr>                      
#>  1 9E      Endeavor Air Inc.          
#>  2 AA      American Airlines Inc.     
#>  3 AS      Alaska Airlines Inc.       
#>  4 B6      JetBlue Airways            
#>  5 DL      Delta Air Lines Inc.       
#>  6 EV      ExpressJet Airlines Inc.   
#>  7 F9      Frontier Airlines Inc.     
#>  8 FL      AirTran Airways Corporation
#>  9 HA      Hawaiian Airlines Inc.     
#> 10 MQ      Envoy Air                  
#> 11 OO      SkyWest Airlines Inc.      
#> 12 UA      United Air Lines Inc.      
#> 13 US      US Airways Inc.            
#> 14 VX      Virgin America             
#> 15 WN      Southwest Airlines Co.     
#> 16 YV      Mesa Airlines Inc.

Created on 2022-05-10 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.0 (2022-04-22 ucrt) #> os Windows 10 x64 (build 22000) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Spanish_Spain.utf8 #> ctype Spanish_Spain.utf8 #> tz America/Santiago #> date 2022-05-10 #> pandoc 2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> babynames * 1.0.1 2021-04-12 [1] CRAN (R 4.2.0) #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0) #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> modeldata * 0.1.1 2021-07-14 [1] CRAN (R 4.2.0) #> nycflights13 * 1.0.2 2021-04-12 [1] CRAN (R 4.2.0) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0) #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0) #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.2.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0) #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.30 2022-03-02 [1] CRAN (R 4.2.0) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0) #> #> [1] C:/Users/jbkun/AppData/Local/R/win-library/4.2 #> [2] C:/Program Files/R/R-4.2.0/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Thanks so much in advance. Kinkd regrads,

juliasilge commented 2 years ago

I think this is because we have:

https://github.com/tidymodels/modeldata/blob/78d1b89fd8b8e9609ca77b56ed15bb0647688cbf/DESCRIPTION#L28

juliasilge commented 2 years ago

@topepo is there a reason why this package has LazyData: false?

If I just change the package to LazyData: true then I see this:

❯ checking LazyData ... WARNING
    LazyData DB of 5.2 MB without LazyDataCompression set
    See §1.1.6 of 'Writing R Extensions'

I think the datasets would need to be resaved with compression, maybe with tools::resaveRdaFiles().

> tools::checkRdaFiles("data/")
                               size ASCII compress version
data//Chicago.rda            287604 FALSE       xz       2
data//ames.rda               109600 FALSE       xz       2
data//check_times.rda        330104 FALSE       xz       2
data//crickets.rda              413 FALSE     gzip       2
data//drinks.rda               1800 FALSE     gzip       2
data//grants.rda             199263 FALSE    bzip2       2
data//hpc_cv.rda             107155 FALSE     gzip       2
data//lending_club.rda       152820 FALSE       xz       2
data//parabolic.rda            8185 FALSE     gzip       2
data//pathology.rda             215 FALSE     gzip       2
data//pd_speech.rda          981096 FALSE       xz       2
data//penguins.rda             2488 FALSE    bzip2       2
data//solubility_test.rda      3475 FALSE     gzip       2
data//stackoverflow.rda       30751 FALSE    bzip2       2
data//tate_text.rda           70981 FALSE    bzip2       2
data//two_class_example.rda    8195 FALSE     gzip       2
data//wa_churn.rda            61324 FALSE       xz       2
data//Sacramento.RData        17492 FALSE       xz       2
data//Smithsonian.RData         821 FALSE     gzip       2
data//ad_data.RData           79260 FALSE       xz       2
data//attrition.RData         27340 FALSE       xz       2
data//biomass.RData           10272 FALSE       xz       2
data//bivariate.RData         13836 FALSE       xz       2
data//car_prices.RData         7620 FALSE     gzip       2
data//cells.RData            728926 FALSE     gzip       2
data//concrete.RData          10995 FALSE     gzip       2
data//covers.RData              719 FALSE     gzip       2
data//credit_data.RData       37408 FALSE       xz       2
data//hpc_data.RData          24468 FALSE       xz       2
data//meats.RData             69928 FALSE       xz       2
data//mlc_churn.RData         96168 FALSE       xz       2
data//oils.RData               1459 FALSE     gzip       2
data//scat.RData               3144 FALSE     gzip       2
data//small_fine_foods.RData 608955 FALSE    bzip2       2
data//two_class_dat.RData     11971 FALSE     gzip       2
DavisVaughan commented 2 years ago

I have a feeling we should be using LazyData: true. https://github.com/tidymodels/modeldata/issues/4

topepo commented 2 years ago

TBH I've never seen any code that uses package::data and I believe the proper idiom to use is data().

If someone wants to make a PR with LazyData: true, go ahead.

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.