njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
649 stars 54 forks source link

Explore options to make `nabular` data smaller #299

Open njtierney opened 2 years ago

njtierney commented 2 years ago
library(naniar)
library(lobstr)
obj_size(riskfactors)
#> 49,232 B
obj_size(nabular(riskfactors))
#> 99,992 B

Created on 2022-04-06 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info 🤱🏻 🏇🏼 👲🏿 ───────────────────────────────────────────────── #> hash: breast-feeding: light skin tone, horse racing: medium-light skin tone, person with skullcap: dark skin tone #> #> setting value #> version R version 4.1.3 (2022-03-10) #> os macOS Big Sur 11.2.2 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2022-04-06 #> pandoc 2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.1) #> cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.1) #> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.1) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.3) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.1) #> dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.1) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.1) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.1) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.1) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.1) #> ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.1) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> lobstr * 1.1.1 2019-07-02 [1] CRAN (R 4.1.0) #> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> naniar * 0.6.1 2021-05-14 [1] CRAN (R 4.1.1) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.1.1) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1) #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.1) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.1 2021-11-02 [1] CRAN (R 4.1.1) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.1) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> visdat 0.5.3 2019-02-15 [1] CRAN (R 4.1.1) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.1) #> xfun 0.30 2022-03-02 [1] CRAN (R 4.1.1) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.1) #> #> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

It might be possible, for example, to make nabular data have a print method that looks like it does currently, and then to somehow store the references to those columns in a more compact way? Although perhaps thinking about that, it could perhaps just be delaying computation to later...some sort of lazy/JIT printing / computation