hannes / MonetDBLite

MonetDB reconfigured as a library
108 stars 11 forks source link

n() not working #181

Closed RCura closed 7 years ago

RCura commented 7 years ago

dplyr's n() seems to have a custom wrapper in MonetDBLite, but it appears to be non working :

library(tidyverse)
library(DBI)
library(MonetDBLite)

# Create example dataset
rep_data <- tibble(val = runif(n = 1E6), grp = if_else(val < .5, "A", "B"))
rep_data %>% group_by(grp) %>% summarise(N = n())
# # A tibble: 2 x 2
# grp      N
# <chr>  <int>
# 1     A 499561
# 2     B 500439

# Copy this dataset to MonetDBLite
con <- dbConnect(MonetDBLite::MonetDBLite(), "testData")
dbWriteTable(conn = con, rep_data, name = "rep_data")

# Query this base
tbl(src = con, "rep_data") %>% group_by(grp) %>% summarise(N = n())

# Error in .local(conn, statement, ...) : 
#   Unable to execute statement 'SELECT "grp", COUNT() AS "N"
# FROM "rep_data"
# GROUP BY "grp"
# LIMIT 10'.
# Server says 'syntax error, unexpected ')' in: "select "grp", count()" ' [#42000].

DBI::dbDisconnect(con)
MonetDBLite::monetdblite_shutdown()

Do you think it can be easily fixed ? I'm currently using RSQLite, and MonetDBLite would be a nice improvement considering queries performances and window functions.

Using dplyr 0.7.0 and MonetDBLite 0.4.0 - sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MonetDBLite_0.4.0 RSQLite_2.0 DBI_0.7 dbplyr_1.1.0 [5] dplyr_0.7.0 purrr_0.2.2.2 readr_1.1.1 tidyr_0.6.3 [9] tibble_1.3.3 ggplot2_2.2.1.9000 tidyverse_1.1.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.11 compiler_3.4.0 cellranger_1.1.0 plyr_1.8.4 [5] forcats_0.2.0 tools_3.4.0 digest_0.6.12 bit_1.1-12 [9] memoise_1.1.0 jsonlite_1.5 lubridate_1.6.0 gtable_0.2.0 [13] nlme_3.1-131 lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.1 [17] psych_1.7.5 parallel_3.4.0 haven_1.0.0 xml2_1.1.1 [21] stringr_1.2.0 httr_1.2.1 hms_0.3 bit64_0.9-7 [25] grid_3.4.0 glue_1.1.0 R6_2.2.1 readxl_1.0.0 [29] foreign_0.8-67 blob_1.1.0 reshape2_1.4.2 modelr_0.1.0 [33] magrittr_1.5 codetools_0.2-15 scales_0.4.1 assertthat_0.2.0 [37] mnormt_1.5-5 rvest_0.3.2 colorspace_1.3-2 stringi_1.1.5 [41] lazyeval_0.2.0 munsell_0.4.3 broom_0.4.2
RCura commented 7 years ago

Further informations :

When running show_query, the problem appears :

tbl(src = con, "rep_data") %>% group_by(grp) %>% summarise(N = n()) %>% show_query()
<SQL>
SELECT "grp", COUNT() AS "N"
FROM "rep_data"
GROUP BY "grp"

Correcting this request by replacing COUNT() by COUNT(*) works as expected :

> dbGetQuery(con, 'select "grp", COUNT(*) AS "N" FROM rep_data GROUP BY grp')
  grp      N
1   A 499780
2   B 500220

And this should normally be handled by dplyr.R code (line 17), so, I don't understand :

sql_translate_env.MonetDBConnection <- function(con) {
  dbplyr::sql_variant(
    scalar = dbplyr::sql_translator(.parent = dbplyr::base_scalar,
      `!=` = dbplyr::sql_infix("<>")
    ),
    aggregate = dbplyr::sql_translator(.parent = dbplyr::base_agg, 
      n = function() dbplyr::sql("COUNT(*)"),
      sd =  dbplyr::sql_prefix("STDDEV_SAMP"),
      var = dbplyr::sql_prefix("VAR_SAMP"),
      median = dbplyr::sql_prefix("MEDIAN")
    ), #FIXME n_distinct
    window = dbplyr::sql_translator(.parent = dbplyr::base_win)
  )
}
hannes commented 7 years ago

I don't understand this either. But recent updates to dplyr/dbplyr broke lots of stuff

hannes commented 7 years ago

I have just pushed commits that should fix the issue.