Closed yitao-li closed 2 years ago
In particular, now if I run the following with sparklyr
:
library(sparklyr)
sc <- spark_connect(master = "local")
sdf <- sdf_len(sc, 5)
print(dbplyr::remote_name(sdf))
sdf <- sdf %>% dplyr::compute()
print(dbplyr::remote_name(sdf))
I see the table had a remote name to begin with, but then after dplyr::compute()
the remote name is not there any more, which seems counter-intuitive, and I don't think this happened with an earlier version of dbplyr.
And same with the memdb_frame example too (i.e., if I simplify the example I mentioned in the issue to something matching the sparklyr example from the above):
library(dplyr)
library(dbplyr)
df <- memdb_frame(x = 1)
print(df %>% remote_name()) # has remote name
df <- df %>% compute()
print(df %>% remote_name()) # remote name becomes NULL now :/
^^ but the examples above do return a non-NULL remote name after compute() with dbplyr 2.0, and I believe there was some change in dbplyr 2.1.x that caused the NULL output
Proposed change: https://github.com/tidyverse/dbplyr/pull/649 (i.e., I think remote_name()
should ignore the "trailing" bunch of op_ungroup
operations -- let me know if it makes sense)
I think it would make more sense to make remote_name()
a generic. And then also op_group_by
should be ignored. For example:
remote_name <- function(x) {
UseMethod("remote_name")
}
#' @export
remote_name.tbl_lazy <- function(x) {
remote_name(x$ops)
}
#' @export
remote_name.op_base <- function(x) {
x$x
}
#' @export
remote_name.op_group_by <- function(x) {
remote_name(x$x)
}
#' @export
remote_name.op_ungroup <- function(x) {
remote_name(x$x)
}
#' @export
remote_name.default <- function(x) {
return()
}
@mgirlich That's a great idea! I have revised my PR accordingly.
FWIW it looks like this was introduced in 43559165cb051b3a2f1a9a1f4abfc05c32f8d020 (#616).
For example, consider
With dbplyr version 2.1.1 the output is
NULL
, but I seem to remember some earlier version of dbplyr would produce a non-NULL
output, and also based on how I understood thedbplyr
documentation, a non-NULL
output is expected, right?BTW the aforementioned
NULL
/ non-NULL
outputs also apply to Spark dataframes insparklyr
or other DB backends, not justmemdb_frame
s.