Closed AndreaPi closed 6 years ago
Hi Andrea!
That option exists when you customize formatting. https://github.com/ropenscilabs/skimr/blob/3605be99e6a170e57bbe75bda992c6b1c13bdcfc/R/formats.R#L12
Best wishes, Michael
Thanks a lot @michaelquinn32 ! However, either skim_format
has some issues or I haven't understood how to use it. According to the help, this should show 4 characters in factor levels:
# Show 4-character names in factor levels
skim_format(.levels = list(nchar = 4))
However, I get an error:
library(skimr)
#> Warning: package 'skimr' was built under R version 3.4.4
foo <- structure(c(33L, 1L, 5L, 27L, 18L, 20L, 31L, 7L, 25L, 6L, 2L,
11L, 11L, 12L, 2L, 36L, 8L, 32L, 22L, 26L, 26L, 18L, 11L, 4L,
21L, 26L, 20L, 1L, 5L, 36L, 28L, 21L, 22L, 37L, 36L, 30L, 14L,
36L, 13L, 7L, 21L, 8L, 33L, 24L, 4L, 1L, 34L, 18L, 17L, 27L,
24L, 24L, 23L, 31L, 19L, 6L, 13L, 20L, 22L, 14L, 23L, 16L, 23L,
31L, 16L, 1L, 35L, 24L, 33L, 35L, 9L, 27L, 4L, 18L, 10L, 30L,
29L, 18L, 18L, 37L, 21L, 15L, 2L, 28L, 17L, 24L, 18L, 10L, 2L,
3L, 31L, 35L, 9L, 28L, 27L, 1L, 23L, 21L, 34L, 25L),
.Label = c("zb-025", "ZB-048", "zb-051", "ZB-053", "zb-060",
"zb-064", "ZB-080", "ZB-092", "ZB-101", "ZB-104",
"ZB-106", "ZB-136", "zb-147", "ZB-155", "ZB-156",
"ZB-158", "zb-175", "zb-182", "ZB-188", "ZB-198",
"zb-205", "ZB-216", "ZB-224", "Zb-228", "ZB-238",
"ZB-240", "ZB-255", "ZB-259", "ZB-262", "ZB-264",
"ZB-269", "ZB-275", "ZB-277", "ZB-282", "ZB-309",
"zb-355", "zb-361"), class = "factor")
skim_format(.levels = list(nchar = 4))
skim(foo)
#> Error in substr(names(x), 1, options$formats$.levels$max_char): invalid substring arguments
sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 16299)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
#> [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
#> [5] LC_TIME=Italian_Italy.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] skimr_1.0.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.14 assertthat_0.2.0 dplyr_0.7.4 digest_0.6.13
#> [5] rprojroot_1.2 R6_2.2.2 backports_1.1.1 magrittr_1.5
#> [9] evaluate_0.10.1 pillar_1.0.1 rlang_0.2.0.9000 stringi_1.1.6
#> [13] bindrcpp_0.2 rmarkdown_1.8 tools_3.4.3 stringr_1.2.0
#> [17] pander_0.6.1 glue_1.1.1 purrr_0.2.3 yaml_2.1.14
#> [21] compiler_3.4.3 pkgconfig_2.0.1 htmltools_0.3.6 bindr_0.1
#> [25] knitr_1.20 tidyselect_0.2.2 tibble_1.4.1
Hi Andrea!
Instead, try
skim_format(.levels = list(max_char = 4))
Here's an example:
library(skimr)
skim_format(.levels = list(max_char = 4))
skim(iris, Species)
#> Skim summary statistics
#> n obs: 150
#> n variables: 5
#>
#> Variable type: factor
#> variable missing complete n n_unique
#> Species 0 150 150 3
#> top_counts ordered
#> seto: 50, vers: 50, virg: 50, NA: 0 FALSE
Best wishes, Michael
Hi all,
not sure if this is intended or not: I couldn't find it in the documentation, but I may have missed it. When
skim
ming a factor, the skimmertop_counts
shows only the first three characters of the levels with the top count. This isn't helpful when the first three characters are the same for all levels, and can lead to confusion. See:As you can see, at a first glance one could think the top counts are
zb-: 7
,zb-: 5
,zb-: 5
andZb-: 5
. Instead these are the first three characters of the top count levels, followed by the levels. However, since the first three characters are the same for all levels, theskim
summary doesn't really allow me to see which the top counts are. What about adding an option which would allow me to choose whether level namesintop_counts
are shown only as abbreviations or at full length? Default could be abbreviations, if you are opinionated about that, but I would be allowed to choose.Of course, since level names can be arbitrarily long, this would mean one or more line breaks could be present into
skim
output. It doesn't seem a problem to me. What do you think?