ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.1k stars 78 forks source link

bespoke skim no longer working #738

Open blueja5 opened 12 months ago

blueja5 commented 12 months ago

I had been using the code below for some time with no issues:

my_skim<-skim_with(base = sfl(complete = n_complete), numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, range = NULL, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, mad = NULL, empty = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

Stroops %>% my_skim() %>% yank("numeric") %>% kbl(format = "latex", booktabs = T, caption = "Inhibition Descriptives",digits = 2) %>% kable_styling(latex_options = c("striped","hold_position", "scale_down"),font_size = 12)

However it now throws an error: Error in dplyr::summarize(): ℹ In argument: skimmed = purrr::map2(...). Caused by error in purrr::map2(): ℹ In index: 1. ℹ With name: numeric. Caused by error in dplyr::summarize(): ℹ In argument: dplyr::across(variable_names, mangled_skimmers$funs). Caused by error in across(): ! .fns must be a function, a formula, or a list of functions/formulas. Backtrace:

  1. ... %>% ...
    1. purrr::map2(...)
    2. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
    3. skimr:::skim_by_type.data.frame(.x[[i]], .y[[i]], ...)
    4. dplyr:::summarise.data.frame(data, dplyr::across(variable_names, mangled_skimmers$funs))
    5. dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
    6. dplyr:::expand_across(dot)

I have spent some time looking at the documentation but I can't work out what it wrong with 'myskim'. I assume there has been a change somewhere that I can't work out. Can you help? It had taken me a very long time to get to this descriptives table I wanted and I'd be very grateful if you can let me know what the problem is. It still occurs if I take out the skims using purrr specifically.

elinw commented 12 months ago

Okay the other recent issue also seems to involve purrr. Can you please let me know what version of purr and dplyr you are on? It would be great if you could reproduce the problem in a simple example in which piping stops when the error occurs and using a data set like iris.

Is there any chance that you have haven labelled data?

Also can you please confirm that plain skim works with your data?

@michaelquinn32

blueja5 commented 12 months ago

Thanks for getting back to me: ‘purrr’ version 1.0.1 ‘dplyr’ version 1.1.2

I am pretty sure that the last time I ran this code I was on earlier versions but I would struggle to tell you what.

I don’t have haven labelled data (full disclosure I don’t know what that is so I suspect not).

Testing out my skim on Iris

my_skim<-skim_with(base = sfl(complete = n_complete), numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, range = NULL, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, mad = NULL, empty = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

my_skim(iris)

Error in dplyr::summarize(): ℹ In argument: skimmed = purrr::map2(...). Caused by error in purrr::map2(): ℹ In index: 1. ℹ With name: numeric. Caused by error in dplyr::summarize(): ℹ In argument: dplyr::across(variable_names, mangled_skimmers$funs). Caused by error in across(): ! .fns must be a function, a formula, or a list of functions/formulas. Backtrace:

  1. skimr (local) my_skim(iris)
    1. purrr::map2(...)
    2. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
    3. skimr:::skim_by_type.data.frame(.x[[i]], .y[[i]], ...)
    4. dplyr:::summarise.data.frame(data, dplyr::across(variable_names, mangled_skimmers$funs))
    5. dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
    6. dplyr:::expand_across(dot)

A different bespoke skim - which still uses items from purrr namespace my_skim <- skim_with(numeric = sfl(complete = n_complete, median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE)))

myskim(iris)

works fine. But puts back a bunch of stuff I don’t want.

Session info: in case useful re conflicts

sessionInfo() R version 4.1.3 (2022-03-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] reprex_2.0.2 ggpubr_0.6.0 trimr_1.1.1 ppcor_1.1 MASS_7.3-55
[6] plyr_1.8.7 car_3.1-1 carData_3.0-5 jtools_2.2.0 lmtest_0.9-40
[11] zoo_1.8-11 lm.beta_1.6-2 stargazer_5.2.3 DataExplorer_0.8.2 rmarkdown_2.18
[16] tables_0.9.10 xtable_1.8-4 knitr_1.40 splithalf_0.8.2 psych_2.2.9
[21] flextable_0.8.5 scales_1.2.1 janitor_2.2.0 rstatix_0.7.2 gtsummary_1.7.1
[26] skimr_2.1.4 rio_0.5.29 Hmisc_4.7-2 Formula_1.2-4 survival_3.2-13
[31] lattice_0.20-45 broom_1.0.1 corrplot_0.92 data.table_1.14.4 here_1.0.1
[36] forcats_1.0.0 stringr_1.5.0 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
[41] tibble_3.2.1 ggplot2_3.4.0 tidyverse_1.3.2 dplyr_1.1.2 kableExtra_1.3.4
[46] pacman_0.5.1

loaded via a namespace (and not attached): [1] utf8_1.2.2 qpcR_1.4-1 tidyselect_1.2.0 lme4_1.1-31
[5] htmlwidgets_1.6.2 grid_4.1.3 munsell_0.5.0 interp_1.1-3
[9] withr_2.5.0 colorspace_2.0-3 highr_0.9 uuid_1.1-0
[13] rstudioapi_0.14 robustbase_0.95-0 ggsignif_0.6.4 officer_0.5.2
[17] labeling_0.4.2 repr_1.1.4 mnormt_2.1.1 bit64_4.0.5
[21] farver_2.1.1 rprojroot_2.0.3 vctrs_0.6.2 generics_0.1.3
[25] xfun_0.39 timechange_0.1.1 R6_2.5.1 cachem_1.0.6
[29] assertthat_0.2.1 promises_1.2.0.1 networkD3_0.4 vroom_1.6.0
[33] nnet_7.3-17 googlesheets4_1.0.1 gtable_0.3.1 sur_1.0.4
[37] rlang_1.1.1 systemfonts_1.0.4 splines_4.1.3 gargle_1.2.1
[41] checkmate_2.1.0 BiocManager_1.30.19 rgl_0.110.2 yaml_2.3.6
[45] abind_1.4-5 modelr_0.1.9 backports_1.4.1 httpuv_1.6.8
[49] tools_4.1.3 ellipsis_0.3.2 jquerylib_0.1.4 RColorBrewer_1.1-3
[53] Rcpp_1.0.9 base64enc_0.1-3 rpart_4.1.16 openssl_2.0.4
[57] deldir_1.0-6 cowplot_1.1.1 haven_2.5.1 cluster_2.1.2
[61] fs_1.6.2 crul_1.3 magrittr_2.0.3 openxlsx_4.2.5.2
[65] googledrive_2.0.0 hms_1.1.2 patchwork_1.1.2 mime_0.12
[69] evaluate_0.18 jpeg_0.1-9 readxl_1.4.1 gridExtra_2.3
[73] compiler_4.1.3 gt_0.9.0 crayon_1.5.2 minqa_1.2.5
[77] htmltools_0.5.4 mgcv_1.8-39 later_1.3.0 tzdb_0.3.0
[81] lubridate_1.9.0 DBI_1.1.3 dbplyr_2.2.1 broom.helpers_1.13.0 [85] boot_1.3-28 Matrix_1.5-1 cli_3.6.1 parallel_4.1.3
[89] igraph_1.4.0 pkgconfig_2.0.3 foreign_0.8-82 xml2_1.3.3
[93] svglite_2.1.0 bslib_0.4.1 webshot_0.5.4 minpack.lm_1.2-2
[97] rvest_1.0.3 snakecase_0.11.0 digest_0.6.30 httpcode_0.3.0
[101] cellranger_1.1.0 htmlTable_2.4.1 gdtools_0.3.0 curl_4.3.3
[105] shiny_1.7.4 nloptr_2.0.3 lifecycle_1.0.3 nlme_3.1-155
[109] jsonlite_1.8.3 viridisLite_0.4.1 askpass_1.1 fansi_1.0.3
[113] pillar_1.9.0 fastmap_1.1.0 httr_1.4.4 DEoptimR_1.0-11
[117] glue_1.6.2 zip_2.2.2 png_0.1-7 fortunes_1.5-4
[121] pander_0.6.5 bit_4.0.4 stringi_1.7.8 sass_0.4.6
[125] gfonts_0.2.0 latticeExtra_0.6-30 memoise_2.0.1

On 20 Jul 2023, at 18:16, Elin Waring @.***> wrote:

Okay the other recent issue also seems to involve purrr. Can you please let me know what version of purr and dplyr you are on? It would be great if you could reproduce the problem in a simple example in which piping stops when the error occurs and using a data set like iris. Is there any chance that you have haven labelled data?

@michaelquinn32 https://github.com/michaelquinn32 — Reply to this email directly, view it on GitHub https://github.com/ropensci/skimr/issues/738#issuecomment-1644298839, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM4A6UNASXNYDX4MHHMTETXRFRXPANCNFSM6AAAAAA2QIBLFY. You are receiving this because you authored the thread.

elinw commented 12 months ago

I have a feeling that we were on dangerous ground using some internal functions. dplyr:::expand_across(dot)

across() has had an API change that impacts the use of ... and requires an anonymous function instead ... that I saw other people online complaining about.

! .fns must be a function, a formula, or a list of functions/formulas.

\() mean(.x, na.rm = TRUE)
elinw commented 12 months ago

Hmm your example works for me and I am on r 4.2.2. I think the native short cut for anonymous functions was introduced in 4.1 so you should have that. But i wonder if there was a bug fix of some kind since 4.1.3. I'm going to look at the change log.

blueja5 commented 12 months ago

Thank you. Could there be something relating to my package order? I recently started using pacman and my scripts start with something like: pacman::p_load(kableExtra, tidyverse, here, data.table, readr, corrplot, dplyr). I am noticing that even though dplyr is last on the list, it still seems to be masked and I keep having to manually add dplyr:: to select, etc which I didn’t used to have to do. I’ve been meaning to look into this to figure out why but I’m mentioning in case this is relevant? I assume everything in the function is called in a way that this shouldn’t be an issue but just in case.

On 21 Jul 2023, at 18:36, Elin Waring @.***> wrote:

Hmm your example works for me and I am on r 4.2.2. I think the native short cut for anonymous functions was introduced in 4.1 so you should have that. But i wonder if there was a bug fix of some kind since 4.1.3. I'm going to look at the change log.

— Reply to this email directly, view it on GitHub https://github.com/ropensci/skimr/issues/738#issuecomment-1646035060, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM4A6UDBB3TECZHODELMTLXRK4YRANCNFSM6AAAAAA2QIBLFY. You are receiving this because you authored the thread.

elinw commented 12 months ago

Okay comparing your two bespoke skims, which both work for me without error, I notice a few issues.

First in the one that doesn't work you have range = NULL but range returns two values and thus can't be used in skimr without modification to pick one or the other. It is not in the default list of numeric skimmers for this reason. What happens if you take out just range = NULL?

Second I notice that in the custom skim that doesn't work you have base = sfl(complete = n_complete), numeric = sfl( ... but in the one that does work you have numeric = sfl(complete = n_complete ...

Could you try changing those one at a time and see if one or both of them fixes the issue?

elinw commented 12 months ago

Yes I can confirm that for me, including range causes the error. I think we should come up with a more graceful message.

elinw commented 12 months ago

See this post: https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/

There was 1 warning in dplyr::summarize(). ℹ In argument: skimmed = purrr::map2(...). ℹ In group 2: skim_type = "numeric". Caused by warning: ! Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0. ℹ Please use reframe() instead. ℹ When switching from summarise() to reframe(), remember that reframe() always returns an ungrouped data frame and adjust accordingly.

So we would need to find a way to identify functions that return multiple values.

elinw commented 12 months ago

@michaelquinn32 Since this is actually working as documented (only functions with a return of length 1 are allowed) what about catching the error and using our own error message?

blueja5 commented 12 months ago

I’m sorry but this is still not working for me; I took ‘range’ out (I have a lot of variants on this skim as I was trying to narrow it down so the two i sent were only a small sample) and I still can’t skim iris with this: (range = NULL omitted).

my_skim<-skim_with(base = sfl(complete = n_complete), numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, mad = NULL, empty = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

I also made the other change to omit base = sfl etc.

my_skim<-skim_with(numeric = sfl(complete = n_complete, median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, mad = NULL, empty = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

which also is throwing the same error.

I started omitting bits from the end. Removing hist= NULL doesn’t make a difference.

When I remove "p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, mad = NULL, empty = NULL, n_unique = NULL, p50 = NULL”, it starts working. Am I using this incorrectly? I have been using this code for over a year with no problems; so this works: It breaks if I put range back in, but this sounds like its as it should be (despite having work before).

my_skim<-skim_with(numeric = sfl(complete = n_complete, median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE)))

However, if I put hist = NULL back in, it stops working:

my_skim<-skim_with(numeric = sfl(complete = n_complete, median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), hist = NULL))

The histograms won’t work for me in Latex so I definitely need to get rid of them.

I can go up to the next R if you think thats the problem, though always terrified of ending up in a vortex of breaking other things which then take me hours to fix (definitely a novice user).

Thanks as ever

On 20 Jul 2023, at 19:50, @.*** wrote:

my_skim <- skim_with(numeric = sfl(complete = n_complete, median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE)))

elinw commented 11 months ago

Okay so still, I think there is the issue that the tidyverse made an important change to handling of statistics that return multiple values. For me range "works" in the sense that I get multiple almost identical rows with the exception that they have two different values of range. But I'm not sure why it doesn't error given the tidyverse change (I did get the error one time but can't reproduce it).

So overall it really looks like something is going wrong with the NULLs. Can you try not including NULL setting for statistics that are not part of the default (mad, empty and range are not in the default numeric skimmers).

One thing is that these are the default numerics

mean sd p0 p25 p50 p75 p100 hist

I don't think you should be using NULL on anything besides them. I also am concerned about doing much of anything with the base sfl since the columns defined by that are used for duck typing skim objects.

I'm assuming that using skim() unmodified works, correct? And also skim_without_charts()?

What I think would be helpful in identifying why you are getting this error is probably start from scratch with creating a skimmer by making one modification at a time and seeing if there is a specific one that throws the error.

If none by itself is causing it, then the next question is what combination is the trigger.

blueja5 commented 11 months ago

Hello again,

skim_without_charts(iris) throws the same error

Starting out with my original skim:

my_skim<-skim_with(base = sfl(complete = n_complete), numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

-> error.

I removed sequentially from the end; hist = NULL, then n_unique = NULL, then p0-p75= NULL all at once. It started working if I delete p0-p74 = NULL.

I started puttig them back. adding either of n_unique or hist brought back the error.

Going back to leaving out base and doing everything in numeric:

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE),n_unique = NULL)) -> error

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), hist = NULL)) -> error

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, p50 = NULL)) -> error

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE))) -> works fine

I unloaded all of my installed packages so I only had: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base”

I loaded skimr.

skim_without_charts(iris) still throws an error

pattern continues as above.

So it does look like NULL anything is the problem, I can add skimmers but not take them away. Happy to try anything else.

Full disclosure I can pretty much get what I need using psych (describe) and sending it to kbl but I am interested in the problem and happy to try other permutations if useful to you.

On 24 Jul 2023, at 22:18, Elin Waring @.***> wrote:

Okay so still, I think there is the issue that the tidyverse made an important change to handling of statistics that return multiple values. For me range "works" in the sense that I get multiple almost identical rows with the exception that they have two different values of range. But I'm not sure why it doesn't error given the tidyverse change (I did get the error one time but can't reproduce it).

So overall it really looks like something is going wrong with the NULLs. Can you try not including NULL setting for statistics that are not part of the default (mad, empty and range are not in the default numeric skimmers).

One thing is that these are the default numerics

mean sd p0 p25 p50 p75 p100 hist

I don't think you should be using NULL on anything besides them. I also am concerned about doing much of anything with the base sfl since the columns defined by that are used for duck typing skim objects.

I'm assuming that using skim() unmodified works, correct? And also skim_without_charts()?

What I think would be helpful in identifying why you are getting this error is probably start from scratch with creating a skimmer by making one modification at a time and seeing if there is a specific one that throws the error.

If none by itself is causing it, then the next question is what combination is the trigger.

— Reply to this email directly, view it on GitHub https://github.com/ropensci/skimr/issues/738#issuecomment-1648625838, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM4A6TPSVRS5YVWNXEFLN3XR3RCXANCNFSM6AAAAAA2QIBLFY. You are receiving this because you authored the thread.

blueja5 commented 11 months ago

Running a different part of my code:

Domain_data %>% group_by(domain) %>% get_summary_stats(count, type = "mean_sd")

Error in mutate(): ℹ In argument: data = map(.data$data, .f, ...). Caused by error in map(): ℹ In index: 1. Caused by error in melt_dataframe(): ! object '_tidyr_melt_dataframe' not found Backtrace:

  1. Domain_data %>% group_by(domain) %>% ...
    1. tidyr:::gather.data.frame(., key = "variable", value = ".value.")
    2. tidyr:::melt_dataframe(…)

map() is also from purrr.

so maybe the underlying problem for both is somehow located here?

On 25 Jul 2023, at 11:32, @.*** wrote:

Hello again,

skim_without_charts(iris) throws the same error

Starting out with my original skim:

my_skim<-skim_with(base = sfl(complete = n_complete), numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, n_unique = NULL, p50 = NULL,hist = NULL))

-> error.

I removed sequentially from the end; hist = NULL, then n_unique = NULL, then p0-p75= NULL all at once. It started working if I delete p0-p74 = NULL.

I started puttig them back. adding either of n_unique or hist brought back the error.

Going back to leaving out base and doing everything in numeric:

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE),n_unique = NULL)) -> error

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), hist = NULL)) -> error

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE), p0 = NULL, p25 = NULL, p75 = NULL, p100 = NULL, p50 = NULL)) -> error

my_skim<-skim_with(numeric = sfl(median = psych::interp.median, skew = psych::skew, skew.ratio = sur::skew.ratio, kurtosis = psych::kurtosi, iqr = purrr::partial(IQR, na.rm = TRUE), min = purrr::partial(min, na.rm = TRUE), max = purrr::partial(max, na.rm = TRUE))) -> works fine

I unloaded all of my installed packages so I only had: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base”

I loaded skimr.

skim_without_charts(iris) still throws an error

pattern continues as above.

So it does look like NULL anything is the problem, I can add skimmers but not take them away. Happy to try anything else.

Full disclosure I can pretty much get what I need using psych (describe) and sending it to kbl but I am interested in the problem and happy to try other permutations if useful to you.

On 24 Jul 2023, at 22:18, Elin Waring @. @.>> wrote:

Okay so still, I think there is the issue that the tidyverse made an important change to handling of statistics that return multiple values. For me range "works" in the sense that I get multiple almost identical rows with the exception that they have two different values of range. But I'm not sure why it doesn't error given the tidyverse change (I did get the error one time but can't reproduce it).

So overall it really looks like something is going wrong with the NULLs. Can you try not including NULL setting for statistics that are not part of the default (mad, empty and range are not in the default numeric skimmers).

One thing is that these are the default numerics

mean sd p0 p25 p50 p75 p100 hist

I don't think you should be using NULL on anything besides them. I also am concerned about doing much of anything with the base sfl since the columns defined by that are used for duck typing skim objects.

I'm assuming that using skim() unmodified works, correct? And also skim_without_charts()?

What I think would be helpful in identifying why you are getting this error is probably start from scratch with creating a skimmer by making one modification at a time and seeing if there is a specific one that throws the error.

If none by itself is causing it, then the next question is what combination is the trigger.

— Reply to this email directly, view it on GitHub https://github.com/ropensci/skimr/issues/738#issuecomment-1648625838, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM4A6TPSVRS5YVWNXEFLN3XR3RCXANCNFSM6AAAAAA2QIBLFY. You are receiving this because you authored the thread.

elinw commented 11 months ago

What would be great is if you could make a minimal reproducible example, meaning do the smallest, simplest code ... no functions from other packages, no using purrr partials . That will really help us isolate the problem.
So starting with NULL and numeric

my_skim <- skim_with(numeric = sfl(mean = NULL))
my_skim(iris)

And keep changing what you NULLs until you have been through them all or you get an error.

Then if you don't trigger the error, you should start adding some base functions like MAD etc. But only ones that only return a single value (not range).

my_skim <- skim_with(numeric = sfl(mad = mad))
my_skim(iris)
blueja5 commented 11 months ago

Hi again the very first one throws an error. In my last email, I had moreorless done that albeit with my extra bits in; my usual skim works as soon as I don’t have any NULL terms, doesn’t seem to make a difference which NULL.. skim_without_charts also does not work. I have also tested it with no other packages loaded.

On 31 Jul 2023, at 05:00, Elin Waring @.***> wrote:

What would be great is if you could make a minimal reproducible example, meaning do the smallest, simplest code ... no functions from other packages, no using purrr partials . That will really help us isolate the problem. So starting with NULL and numeric

my_skim <- skim_with(numeric = sfl(mean = NULL)) my_skim(iris) And keep changing what you NULLs until you have been through them all or you get an error.

Then if you don't trigger the error, you should start adding some base functions like MAD etc. But only ones that only return a single value (not range).

my_skim <- skim_with(numeric = sfl(mad = mad)) my_skim(iris) — Reply to this email directly, view it on GitHub https://github.com/ropensci/skimr/issues/738#issuecomment-1657497948, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM4A6VFYUE2O3XKGUFOO6LXS4UXRANCNFSM6AAAAAA2QIBLFY. You are receiving this because you authored the thread.

elinw commented 10 months ago

Okay it's strange but I'll keep trying to reproduce.