tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
469 stars 169 forks source link

rstudio autocomplete slows down from 2.3.4 -> 2.4.0 #1479

Closed shearerpmm closed 2 weeks ago

shearerpmm commented 3 months ago

Tab-autocomplete of table column names is slower when using dbplyr 2.4.0+ than 2.3.4. The code below reproduces the issue on my machine, which is on RStudio Version 2023.12.1+402 (2023.12.1+402). sessionInfo() at bottom.

Autocomplete using $ is instantaneous on 2.3.4 but takes 1-2 seconds on 2.4.0. It seems unaffected in the dplyr::select environment.

While the delay in autocomplete here is only a couple seconds, I am finding that it is much worse with lazy tbls based on a JDBC connection (to my organization's Vertica database). In that case I see delays of 5-10 seconds, and they affect both $ and dplyr::select autocomplete. But it would be harder/maybe-impossible for me to provide a reprex of this, so I'm hoping that this one will suffice to diagnose the issue.

The issue also affects the current CRAN version 2.5.0.

# with dbplyr 2.3.4 --------------------------------------------------------------------------------------------

# NOTE: restart rstudio after install
devtools::install_version('dbplyr', '2.3.4')

library(dplyr)
library(dbplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dplyr::copy_to(con, mtcars, "mtcars")
test <- dplyr::tbl(con, "mtcars")

#' - typing `test$` and hitting tab in rstudio yields an immediate menu of autocomplete options
#' - typing `test %>% select(` does similarly

# with dbplyr 2.4.0 --------------------------------------------------------------------------------------------

# NOTE: restart rstudio after install
devtools::install_version('dbplyr', '2.4.0')

library(dplyr)
library(dbplyr)

# prepare DB and tbl_sql
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dplyr::copy_to(con, mtcars, "mtcars")
test <- dplyr::tbl(con, "mtcars")

#' - typing `test$` and hitting tab in rstudio takes 1-2 seconds to bring up the autocomplete menu
#' - typing `test %>% select(` and hitting tab in rstudio is still pretty snappy

sessionInfo

R version 4.3.3 (2024-02-29)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Denver
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dbplyr_2.4.0 dplyr_1.1.4 

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5      cli_3.6.2        rlang_1.1.3      stringi_1.7.12   DBI_1.2.2        purrr_1.0.2      generics_0.1.3   rJava_1.0-6     
 [9] glue_1.6.2       bit_4.0.5        mmlib_0.10.4     readxl_1.4.2     fansi_1.0.4      cellranger_1.1.0 tibble_3.2.1     fastmap_1.1.1   
[17] lifecycle_1.0.4  RJDBC_0.2-10     memoise_2.0.1    stringr_1.5.0    compiler_4.3.3   RSQLite_2.3.5    blob_1.2.4       timechange_0.2.0
[25] pkgconfig_2.0.3  rstudioapi_0.14  R6_2.5.1         tidyselect_1.2.1 utf8_1.2.3       pillar_1.9.0     magrittr_2.0.3   tools_4.3.3     
[33] bit64_4.0.5      lubridate_1.9.2  cachem_1.0.8    
shearerpmm commented 2 weeks ago

In case it helps, here is the diff between dbplyr 2.3.4 and 2.4.0: https://github.com/tidyverse/dbplyr/compare/v2.3.4...v2.4.0

kevinushey commented 2 weeks ago

It looks like using $ to access a column from one of these tables previously returned NULL, but now instead signals an error. Compare:

> test$mpg
NULL

and

> test$mpg
Error in `test$mpg`:
! The `$` method of <tbl_lazy> is for internal use only.
ℹ Use `dplyr::pull()` to get the values in a column.
Run `rlang::last_trace()` to see where the error occurred.

Unfortunately, RStudio tries to iterate over each column and return the column value + type when producing completions, leading to this issue. I'm guessing we'll need to change RStudio to accommodate these changes.

shearerpmm commented 2 weeks ago

Thank you so much! Since I only use this feature to discover column names on a table, and pressing enter normally just returns NULL, I did not think to actually press enter here.

Some context: my use of $-tab completion is actually a workaround for semi-broken tab-completion within select() contexts while using dbplyr + Vertica. But I haven't created tickets for that because I have yet to figure out how to make it reproducible.

I'll close this for now since it looks like you have the fix within RStudio.