ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 79 forks source link

"No common type" error #574

Closed antondutoit closed 4 years ago

antondutoit commented 4 years ago

I'm getting an error when running skimr, as follows:

Error: No common type for..1$by_variable$numeric.p0<double> and..18$by_variable$numeric.p0<Period>.

I'm guessing this has to do with data types, but I am an R user rather than a programmer so I don't know what to do to fix this. My code was previously running fine on a superset of the data which produced this error, so its appearance now is curious. I had added three new variables to the data frame, but using dplyr::select to take them out of the input to skimr did not have any effect. Nor did taking out the one integer variable in the dataset (the rest of the numerics being doubles, obviously).

NB I have updated every package on my R install.

I can't attach original data due to confidentiality (ethics protocol), but if it's required for a solution I will see if I can generate some synthetic data which reproduces the error.

Any help would be much appreciated. Thanks.


last_error and last_trace output below:

> rlang::last_error()
<error>
message: No common type for `..1$by_variable$numeric.p0` <double> and `..18$by_variable$numeric.p0` <Period>.
class:   `vctrs_error_incompatible_type`
backtrace:
  1. skimr:::custom_skim_with(.)
23. vctrs::vec_default_ptype2(x, y, x_arg = x_arg, y_arg = y_arg)
24. vctrs::stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
25. vctrs:::stop_incompatible(...)
26. vctrs:::stop_vctrs(...)
27. skimr:::custom_skim_with(.)
Call `rlang::last_trace()` to see the full backtrace

> rlang::last_trace()
     x
  1. +-HDR_filter_df_2 %>% custom_skim_with(.)
  2. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4. |   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5. |     \-`_fseq`(`_lhs`)
  6. |       \-magrittr::freduce(value, `_function_list`)
  7. |         +-base::withVisible(function_list[[k]](value))
  8. |         \-function_list[[k]](value)
  9. |           \-skimr:::custom_skim_with(.)
10. |             +-dplyr::summarize(...)
11. |             \-dplyr:::summarise.tbl_df(...)
12. |               \-dplyr:::summarise_impl(.data, dots, environment(), caller_env())
13. +-purrr::map2(...)
14. | +-skimr:::.f(.x[[1L]], .y[[1L]], ...)
15. | \-skimr:::skim_by_type.data.frame(.x[[1L]], .y[[1L]], ...)
16. |   \-skimr:::build_results(skimmed, variable_names, NULL)
17. |     +-tidyr::unnest(out, .data$by_variable)
18. |     \-tidyr:::unnest.data.frame(out, .data$by_variable)
19. |       \-tidyr::unchop(data, !!cols, keep_empty = keep_empty, ptype = ptype)
20. |         \-vctrs::vec_rbind(!!!x, .ptype = ptype)
21. +-vctrs:::vec_type2_dispatch(x = x, y = y, x_arg = x_arg, y_arg = y_arg)
22. +-vctrs::vec_ptype2.double(x = x, y = y, x_arg = x_arg, y_arg = y_arg)
23. \-vctrs:::vec_ptype2.double.default(...)
24.   \-vctrs::vec_default_ptype2(x, y, x_arg = x_arg, y_arg = y_arg)
25.     \-vctrs::stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
26.       \-vctrs:::stop_incompatible(...)
27.         \-vctrs:::stop_vctrs(...)
michaelquinn32 commented 4 years ago

I think this this is tied to your data. You have a column that has the type Period and inherits from numeric. skim() is dispatching numeric summary functions, but the values returned by those functions aren't numeric.

Here's how I can reproduce your error:

library(skimr)

mean.period <- function(x, ...) {
  res <- NextMethod("mean", x, ...)
  structure(res, class = c("Period", "numeric"))
}

my_df <- data.frame(
  numeric = 1:3,
  period = structure(1:3, class = c("period", "numeric"))
)

skim(my_df)
#> Error: No common type for `..1$by_variable$numeric.mean` <double>
#> and `..2$by_variable$numeric.mean` <Period>.

You can explore the types of columns in your data by calling str() on it.

To fix this, you need to let skimr known that a different class exists within your data frame. This approach treats your period column as numeric, redeploying the default skimming functions. You might want to call ?skim_with or skimr::stats for more ideas on how to summarize this data.

my_skim <- skim_with(
  period = modify_default_skimmers("numeric", new_skim_type = "period")
)
my_skim(my_df)
── Data Summary ────────────────────────
                           Values
Name                       my_df 
Number of rows             3     
Number of columns          2     
_______________________          
Column type frequency:           
  numeric                  1     
  period                   1     
________________________         
Group variables            None  

── Variable type: numeric ──────────────────────────────────────────────────────
  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75
1 numeric               0             1     2     1     1   1.5     2   2.5
   p100 hist 
1     3 ▇▁▇▁▇

── Variable type: period ───────────────────────────────────────────────────────
  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75
1 period                0             1     2     1     1   1.5     2   2.5
   p100 hist 
1     3 ▇▁▇▁▇