Closed lschneiderbauer closed 2 years ago
Since dtplyr
uses lazy evaluation, the use of where()
is not supported. Unfortunately there is no way to know the type of a column in a lazy workflow.
If you download the development version of dtplyr
there is now an error message if you try to use where()
.
# devtools::install_github("tidyverse/dtplyr")
library(dplyr, warn.conflicts = FALSE)
library(dtplyr)
data <- lazy_dt(tibble(x=c(1,2)))
data %>% select(where(~all(is.numeric(.))))
#> Error in `select()`:
#> ! The use of `where()` is not supported by dtplyr.
Hope this helps - if you have any questions let me know.
@markfairbanks thanks for the quick response; I also encountered and was surprised by the same issue today. This limitation (not evident anywhere in the documentation) makes it a bit harder to use dtplyr as a plug-in alternative to dplyr.
Perhaps an alternative would be to automatically call as.data.table()
when encountering where()
? This could even be off by default, only on by an option to lazy_dt()
?
Also, at this point are you aware of other things that can be done in e.g. tidytable
that can't be achieved with dtplyr
? Thanks!
Perhaps an alternative would be to automatically call
as.data.table()
when encounteringwhere()
?
Doing something like this would cause issues when users are expecting a lazy chain to continue but it suddenly evaluates. So this won't be possible to do unfortunately.
This might have been doable before https://github.com/tidyverse/dtplyr/pull/372, but we no longer automatically convert a data.table
object to a lazy_dt()
- it was causing too many problems (see https://github.com/tidyverse/dtplyr/issues/312).
Also, at this point are you aware of other things that can be done in e.g.
tidytable
that can't be achieved withdtplyr
?
Here are a few examples. I don't think the full list is that big though.
Things that can't be implemented in dtplyr
where()
bind_rows()
/_cols()
- we can't build an S3 method since that's not how they work in dplyr
.crossing()
/expand_grid()
/etc. - things made to work outside of a data frame context.Things that can be implemented eventually once data.table
translations become available (or better)
fill()
on character/factor/logical columns. In tidytable
I use vctrs::vec_fill_missing()
in the background for these cases. data.table::nafill()
doesn't support these types.unnest()
: data.table
support for unnesting is a bit limited at the moment. In dtplyr
's lazy workflow we can't identify if a list column contains data frames or vectors. tidytable
doesn't have to worry about this since it uses eager evaluation.I see, thanks a lot for the detailed explanation, as well as the other pointers!
Running a
select
-statement on a lazy data table in combination withwhere
does not return what I expect:Created on 2022-09-08 by the reprex package (v2.0.1)
All cases return an empty set of data while I expect all the data to still be present, since the conditions are satisfied (all columns are numeric and no value is NA).