ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.1k stars 79 forks source link

skim() doesn't work at the end of a long pipe #655

Closed pulleyps closed 3 years ago

pulleyps commented 3 years ago

I am finding that if I run the following, skim() outputs appropriately:

iris %>% dplyr::group_by(Species) %>% skim()

But when I apply it to the end of a long pipe with real data for my project at work, skim() does nothing. I can see it running, but then it displays a prompt as if nothing happened, not showing any output. If I replace the skim() function at the end with print() then the result prints nicely as expected:

data <- starting_data %>%
  select(WeekBeginning, TerminalID, CP, CPDetail, SalesCount, SalesAmount, ReturnsCount, ReturnsAmount) %>%
  filter(WeekBeginning > '2020-12-20') %>%
  mutate(TerminalID = as.character(TerminalID)) %>%
  mutate(CP_Status = case_when(
    CP == "CNP" ~ "CNP",
    CP == "CP" ~ "CP",
    CP == "CP(Keyed-ECI)" ~ "CNP",
    CP == "CP(Keyed-EMV)" ~ "CNP",
    CP == "CP(Keyed-NoEMV)" ~ "CNP",
    TRUE ~ "N/A")) %>%
  filter(CP_Status != "N/A") %>%
  group_by(WeekBeginning, TerminalID, CP_Status) %>%
  summarize(Transactions = sum(SalesCount + ReturnsCount, na.rm = TRUE),
            Volume = sum(SalesAmount + ReturnsAmount, na.rm = TRUE)) %>%
  ungroup() %>%
  right_join(outcall_sf, by = c("TerminalID" = "XWeb_Terminal_ID")) %>%
  arrange(WeekBeginning, TerminalID, desc(CP_Status)) %>%
  skim()

The output looks like this:

image

or: _ungroup() %>%

Again, if I use print in place of skim() then I see the output as usual and expected. But with skim(), the results are not displayed. Any ideas?

michaelquinn32 commented 3 years ago

Hi!

Sorry for the issue.

Can you first check to see that this isn't a metadata stripping issue?

Set

options(skimr_strip_metadata = FALSE)

If it's not that, then we'll have to think harder about what's going on. I'm having trouble reproducing this issue on a dev machine.

Best wishes, Michael

elinw commented 3 years ago

It's not printing because you are assigning the whole piped thread to an object.

pulleyps commented 3 years ago

It's not printing because you are assigning the whole piped thread to an object.

I think I see. If I take out "data <-" at the very beginning, I think it resolves the issue. Now I get a new error though saying

"Error: Problem with summarise() input skimmed. x negative length vectors are not allowed i Input skimmed is purrr::map2(...). i The error occured in group 1: skim_type = "character"."

I know some of the values in my columns are negative numbers, but is that the problem this is referring to? Other than that I can't imagine what a negative length vector looks like.

elinw commented 3 years ago

Can you give the nrow() of the tibble you get after arrange(WeekBeginning, TerminalID, desc(CP_Status)) ? Skimr doesn't have a problem with summarized data.

Putting that error message into Google it seems to be associated with the join creating a data set that uses too much memory. This is probably due to duplicate values of your ID variable.

pulleyps commented 3 years ago

The tibble dimensions are 19,584 rows by 12 columns.

I re-ran the script just now with the "data <- " removed and this time it ran just fine, with skim() showing output. I have no idea why the negative length vector issue came up originally (or what it means) nor why I cannot reproduce it now.

I guess this case is resolved. Thanks!

elinw commented 3 years ago

Okay thanks for the follow up. IF you save it to data you just have to type data to see it.