ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 78 forks source link

Dots in in-line histogram #735

Closed kivanvan closed 1 year ago

kivanvan commented 1 year ago

When I use skim() to check the distribution of my data, I get some "..." in the histogram. It only exists in some variables. What do the dots mean? skimr

I ran the code in RMarkdown with R 4.2.2 and skimr_2.1.5.

elinw commented 1 year ago

Would you mind checking what the histograms look like for those variables if you do hist(df$variablename, breaks = 8)? Or could you share a small sample data set that does this? I'm puzzled by the 4th and 8th images since they don't do that.

kivanvan commented 1 year ago

a b

Above are the histograms of the first two variables although the breaks argument works differently for each variable. I'm also attaching a subset of my data including the 4th and 8th variables (named as d and h) below. It is interesting that when I rerun skim() on the subset, I can see the complete histograms for all variables. But for the 4th and 8th, it's the same image. My original data has 80 variables, all numerical. But I don't think it can be considered too big.

githubdata.csv

elinw commented 1 year ago

Are you knitting to HTML? I also see all four completely when I knit to html with your csv.
Or are you running interactively?

kivanvan commented 1 year ago

I just ran my code in the Rmarkdown and viewed the results in the preview window (not through the viewer). I tested my csv file again, and I got the dots back in the first two histograms. This is so wired. Do you think it can be related to my R environment? I'm pasting the results from sessionInfo() below.

R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] here_1.0.1 dplyr_1.1.0 phyloseq_1.42.0

loaded via a namespace (and not attached): [1] nlme_3.1-160 bitops_1.0-7 matrixStats_0.63.0 bit64_4.0.5
[5] doParallel_1.0.17 RColorBrewer_1.1-3 httr_1.4.5 rprojroot_2.0.3
[9] GenomeInfoDb_1.34.9 repr_1.1.6 dynamicTreeCut_1.63-1 tools_4.2.2
[13] backports_1.4.1 utf8_1.2.2 R6_2.5.1 vegan_2.6-4
[17] rpart_4.1.19 Hmisc_4.8-0 DBI_1.1.3 BiocGenerics_0.44.0
[21] mgcv_1.8-41 colorspace_2.1-0 permute_0.9-7 rhdf5filters_1.10.0
[25] ade4_1.7-20 nnet_7.3-18 withr_2.5.0 tidyselect_1.2.0
[29] gridExtra_2.3 preprocessCore_1.60.2 bit_4.0.5 compiler_4.2.2
[33] WGCNA_1.72-1 cli_3.4.1 Biobase_2.58.0 htmlTable_2.4.1
[37] scales_1.2.1 checkmate_2.1.0 digest_0.6.30 stringr_1.5.0
[41] foreign_0.8-83 XVector_0.38.0 htmltools_0.5.4 base64enc_0.1-3
[45] jpeg_0.1-10 pkgconfig_2.0.3 fastmap_1.1.0 htmlwidgets_1.6.1
[49] rlang_1.0.6 impute_1.72.3 rstudioapi_0.14 RSQLite_2.3.0
[53] generics_0.1.3 jsonlite_1.8.3 RCurl_1.98-1.9 magrittr_2.0.3
[57] GO.db_3.16.0 GenomeInfoDbData_1.2.9 Formula_1.2-5 biomformat_1.26.0
[61] interp_1.1-3 Matrix_1.5-3 Rcpp_1.0.9 munsell_0.5.0
[65] S4Vectors_0.36.1 Rhdf5lib_1.20.0 fansi_1.0.3 ape_5.6-2
[69] lifecycle_1.0.3 stringi_1.7.8 MASS_7.3-58.1 zlibbioc_1.44.0
[73] rhdf5_2.42.0 plyr_1.8.8 grid_4.2.2 blob_1.2.3
[77] parallel_4.2.2 crayon_1.5.2 deldir_1.0-6 lattice_0.20-45
[81] Biostrings_2.66.0 splines_4.2.2 multtest_2.54.0 KEGGREST_1.38.0
[85] knitr_1.42 pillar_1.8.1 igraph_1.3.5 fastcluster_1.2.3
[89] reshape2_1.4.4 codetools_0.2-18 stats4_4.2.2 glue_1.6.2
[93] latticeExtra_0.6-30 data.table_1.14.6 png_0.1-8 vctrs_0.5.2
[97] foreach_1.5.2 tidyr_1.3.0 purrr_1.0.1 gtable_0.3.1
[101] cachem_1.0.7 ggplot2_3.4.1 xfun_0.36 skimr_2.1.5
[105] survival_3.4-0 tibble_3.1.8 iterators_1.0.14 AnnotationDbi_1.60.0
[109] memoise_2.0.1 IRanges_2.32.0 cluster_2.1.4

elinw commented 1 year ago

What I noticed is that when I ran things notebook style that I sometimes would get --- at the end because the window was not wide enough to accommodate the full width. Widening the window and rerunning fixed. But I agree that this is not optimal.

kivanvan commented 1 year ago

This is good to know. I can see the end bars without reruning if I widen my window to the very right. Thanks a lot!