Open DaZaM82 opened 1 year ago
Some additional info...
If I run the code up to group_by() and nothing afterwards, then that works fine. Error is thrown by the skimr function
I tried to see if any of the groupings had zero length and found that none do, so not sure what is going on:
I don't think the issue is the number of rows. It's with the classes. Is it possible to isolate which data type is causing the problem by filtering prior to skimming? Also, is the group_by() necessary to cause the error?
Hi @elinw
Thanks for that. I was able to identify the problem following your suggestion.
The error is when there are date columns that are all NA.
So now I can generate a reprex:
iris %>% mutate(date = NA_Date_) %>% group_by(Species) %>% skimr::skim()
This seems like a bug to me?
For sure a bug. Thanks for the report!
@michaelquinn32 When all values are NA and na.rm = TRUE
(which we use everywhere) the functions like min()
return Inf
with a warning. . Whereas if na.rm = FALSE
is set they return NA. mean()
and others, however, return NA without a warning. So I think we (a) should decide what should be returned when skimming such data and (b) need to think about handling the warning. I suspect the Inf is possibly the ultimate cause of the issue reported.
Okay I take it back, because there is no such thing as NA_Date_
as far as I can tell, unless it is defined in a package. So I think that issue is getting absorbed into the pipe chain and then causing the errors. You can see this if you try
iris %>% mutate(date = NA_date_)
So then the pipe ends up passing nothing along to the final steps.
Update:
So lubridate does does define NA_Date
and loading it first solves the error, though not the warnings.
library(lubridate)
iris %>% mutate(date = NA_Date_) |> skimr::skim()
Warning messages: 1: In min.default(c(NAreal, NAreal, NAreal, NAreal, NAreal, : no non-missing arguments to min; returning Inf 2: In max.default(c(NAreal, NAreal, NAreal, NAreal, NAreal, : no non-missing arguments to max; returning -Inf
I guess the question is, do we want to silence the warnings? Or would we want to handle like a numeric since I don't think min and max are defined for the date class. Consistency with Numeric would return NA for both min and max.
iris %>% mutate(date = NAdate)
@elinw NA_Date_
is a valid object even without lubridate. Your above code throws an error because you have a lower case d where it should be a capital D. If you change it from NA_date_
to NA_Date_
then it works.
For some data frames I get an error when applying group_by before skim(). Doesn’t seem to depend on the grouping variable type.
this issue doesn’t come up for iris though where it works.
I’m not sure how to provide a reprex for this. Is this a known issue? Any way to resolve it?