r-dbi / DBI

A database interface (DBI) definition for communication between R and RDBMSs
https://dbi.r-dbi.org
GNU Lesser General Public License v2.1
296 stars 75 forks source link

Should `DBI::dbWriteTable` support overloading `value` argument to support a `SQL` query #488

Open talegari opened 1 month ago

talegari commented 1 month ago

This is one of my typical workflow: con (say odbc) --> tbl --> some_dplyr_ops --> save (as a table). Note that data does NOT come into the R's memory. But the code does look clumpsy:

tbl_dbplyr %>% 
  dbplyr::sql_render() %>% 
  paste("create table a.b.c as", .) %>%
  DBI::dbExecute(con, .)

This could be (intuitively) replaced by:

DBI::dbWriteTable(con, DBI::Id("a", "b", "c"), value = <sql>)
# or: DBI::dbAppendTable(con, DBI::Id("a", "b", "c"), value = <sql>)

Right now, value is understood to be a data.frame in R session's memory. In some cases, some database backends allow value to be a filename too.

Q1. Would overloading DBI::dbWriteTable to cover this case add value without breaking the semantics? Q2. Having a new generic say DBI::dbWriteQuery make sense?

PS:

  1. dbplyr::compute does this job is most cases. In practice, it has hit roadblocks like "only temporary tables are supported" (based on DB backend) whereas raw DBI::dbExecute gets the job done.
  2. IMHO, support for value = <sql> should be at DBI level, not dbplyr.
krlmlr commented 1 month ago

Thanks. This is a tight balance -- how much SQL generation should DBI support. I see how it would be useful to have this "just work", at the same time, I don't want to take too much responsibility regarding SQL generation.

The specific cases where compute() doesn't work -- are there corresponding issues in the dbplyr repo?

talegari commented 1 month ago

@krlmlr

  1. About sql generation: DBI need not generate any sql, dbplyr does. Simply put, DBI needs to accept a query like it accepts a dataframe.
  2. compute() is supposed to fail (gracefully) in some cases. AFAIK, Hadley/dbplyr visualizes compute() more like a exploratory tool rather than a serious "write to database" tool. So, not ideal for production usecases.

One example is spark sql where we get a meaningful error. image

IMHO, DBI has worked like a hard button (DBI::dbExecute) and it is the last resort for what a R-user can do from within R. Whatever did not work with compute() was accomplished this way successfully:

# this works
tile_details_tbl %>% 
  sql_render() %>% 
  paste("CREATE TABLE sandbox.srikanthks.tile_data AS", .) %>% 
  DBI::dbExecute(con, .)

This brings us exactly to:

Now that DBI already does it, should it support the functionality via dbWriteTable (most intuitive place a user would for).

As an individual, I can wrap the previous chunk within a function and make my workflow clean, but I think, getting this as a part of dbWriteTable makes me feel empowered!

krlmlr commented 1 month ago

The "CREATE TABLE sandbox.srikanthks.tile_data AS" part is what I refer to as SQL generation.

The dbplyr message seems to originate here: https://github.com/tidyverse/dbplyr/pull/1379/files#diff-f099d8ec45544adf5c40196344d55c458564434dbaa5d7ad090510f33d5cf63bR115 .

@hadley: do you remember why you added the early exit for compute(temporary = FALSE) in Spark?

hadley commented 1 month ago

@krlmlr I think that original hack was due to a misunderstanding. I wouldn't replicate it.

krlmlr commented 1 month ago

@hadley: Thanks. Does this mean that you'd accept a PR that fixes compute(temporary = FALSE) on Spark?

hadley commented 1 month ago

Yes.