r-dbi / bigrquery

An interface to Google's BigQuery from R.
https://bigrquery.r-dbi.org
Other
517 stars 182 forks source link

tbl(con, sql("string query")) errors if the query contains 3 or more periods #611

Open seth-cp opened 5 months ago

seth-cp commented 5 months ago

When I run a tbl(con, sql(...)), if the hand-rolled sql query contains more than 3 periods/dots, the query is rejected because bigrquery believes it should always be able to split the query on . into catalog, schema, and table names. See: https://github.com/r-dbi/bigrquery/blob/3642c14f026d337b842aa09bc4cf17b90955d4bb/R/dbi-connection.R#L462

I recently upgraded to R version 4.4.0 and updated many packages, including bigrquery, dbplyr, and dplyr. When I was using bigrquery version 1.4.1, I did not encounter this issue. Rerunning my same old code under 1.5.1 gave me this error for the first time. Is this a documented breaking change, have I failed to update my packages correctly, or is this a bug?

This issue may be of interest to followers of #540.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

bigrquery::bq_auth()

con <- DBI::dbConnect(
  bigrquery::bigquery(),
  project = "bigquery-public-data",
  dataset = "stackoverflow",
  billing = Sys.getenv("BIGQUERY_BILLING_PROJECT")
)

# Works
tbl(con, "tags")
#> # Source:   table<`tags`> [?? x 5]
#> # Database: BigQueryConnection
#>        id tag_name         count excerpt_post_id wiki_post_id
#>     <int> <chr>            <int>           <int>        <int>
#>  1 149358 announcement         0        69362460     69362459
#>  2   2136 regsvr32           256        10051236     10051235
#>  3   8287 rpg                256         7080550      7080549
#>  4   9136 dbexpress          256         7663373      7663372
#>  5   9213 multiple-domains   256        22931444     22931443
#>  6  11107 menustrip          256        47237574     47237573
#>  7  15213 fileserver         256         9275976      9275975
#>  8  25272 bad-alloc          256        14261589     14261588
#>  9  34059 pitch              256        37258061     37258060
#> 10  37612 winrar             256        13178390     13178389
#> # ℹ more rows

# Works
tbl(con, sql("SELECT t.id FROM tags t"))
#> # Source:   SQL [?? x 1]
#> # Database: BigQueryConnection
#>        id
#>     <int>
#>  1 149358
#>  2   2136
#>  3   8287
#>  4   9136
#>  5   9213
#>  6  11107
#>  7  15213
#>  8  25272
#>  9  34059
#> 10  37612
#> # ℹ more rows

# Does not work because there are 3 periods and hence more than 3 "pieces"
tbl(con, sql("SELECT t.id, t.tag_name, t.count FROM tags t"))
#> Error in `as_bq_table()`:
#> ! `name` ("SELECT t.id, t.tag_name, t.count FROM tags t") must have 1-3
#>   components.

# By contrast, DBI doesn't care about pieces
DBI::dbGetQuery(con, sql("SELECT t.id, t.tag_name, t.count FROM tags t"), n = 10)
#> # A tibble: 10 × 3
#>        id tag_name         count
#>     <int> <chr>            <int>
#>  1 149358 announcement         0
#>  2   2136 regsvr32           256
#>  3   8287 rpg                256
#>  4   9136 dbexpress          256
#>  5   9213 multiple-domains   256
#>  6  11107 menustrip          256
#>  7  15213 fileserver         256
#>  8  25272 bad-alloc          256
#>  9  34059 pitch              256
#> 10  37612 winrar             256

Created on 2024-06-07 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.0 (2024-04-24) #> os macOS Sonoma 14.4.1 #> system x86_64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Chicago #> date 2024-06-07 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> bigrquery 1.5.1 2024-03-14 [2] RSPM (R 4.4.0) #> bit 4.0.5 2022-11-15 [2] RSPM (R 4.4.0) #> bit64 4.0.5 2020-08-30 [2] RSPM (R 4.4.0) #> brio 1.1.5 2024-04-24 [2] RSPM (R 4.4.0) #> cli 3.6.2 2023-12-11 [2] RSPM (R 4.4.0) #> curl 5.2.1 2024-03-01 [2] RSPM (R 4.4.0) #> DBI 1.2.3 2024-06-02 [2] RSPM (R 4.4.0) #> dbplyr * 2.5.0 2024-03-19 [1] RSPM (R 4.4.0) #> digest 0.6.35 2024-03-11 [2] RSPM (R 4.4.0) #> dplyr * 1.1.4 2023-11-17 [1] RSPM (R 4.4.0) #> evaluate 0.23 2023-11-01 [2] RSPM (R 4.4.0) #> fansi 1.0.6 2023-12-08 [2] RSPM (R 4.4.0) #> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0) #> fs 1.6.4 2024-04-25 [2] RSPM (R 4.4.0) #> gargle 1.5.2 2023-07-20 [2] RSPM (R 4.4.0) #> generics 0.1.3 2022-07-05 [2] RSPM (R 4.4.0) #> glue 1.7.0 2024-01-09 [2] RSPM (R 4.4.0) #> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0) #> httr 1.4.7 2023-08-15 [2] RSPM (R 4.4.0) #> jsonlite 1.8.8 2023-12-04 [2] RSPM (R 4.4.0) #> knitr 1.47 2024-05-29 [2] RSPM (R 4.4.0) #> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0) #> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0) #> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.4.0) #> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0) #> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.4.0) #> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0) #> reprex 2.1.0 2024-01-11 [1] RSPM (R 4.4.0) #> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0) #> rmarkdown 2.27 2024-05-17 [2] RSPM (R 4.4.0) #> rstudioapi 0.16.0 2024-03-24 [2] RSPM (R 4.4.0) #> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.4.0) #> tibble 3.2.1 2023-03-20 [2] RSPM (R 4.4.0) #> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.4.0) #> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0) #> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0) #> withr 3.0.0 2024-01-16 [2] RSPM (R 4.4.0) #> xfun 0.44 2024-05-15 [2] RSPM (R 4.4.0) #> yaml 2.3.8 2023-12-11 [2] RSPM (R 4.4.0) #> #> [1] /Users/seth/Library/R/x86_64/4.4/library #> [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
seth-cp commented 5 months ago

To be clear, I don't actually care about giving a table alias when selecting from a single table. I have a lot of hand-rolled SQL that I want to be able to stack dplyr verbs on top of, and the hand-rolled sql contains a lot of SELECT x.a, x.b, y.c FROM x LEFT JOIN y ON (x.a = y.a) and so on.