n() not working #181

RCura commented 7 years ago

dplyr's n() seems to have a custom wrapper in MonetDBLite, but it appears to be non working :


# Create example dataset
rep_data <- tibble(val = runif(n = 1E6), grp = if_else(val < .5, "A", "B"))
rep_data %>% group_by(grp) %>% summarise(N = n())
# # A tibble: 2 x 2
# grp      N
# <chr>  <int>
# 1     A 499561
# 2     B 500439

# Copy this dataset to MonetDBLite
con <- dbConnect(MonetDBLite::MonetDBLite(), "testData")
dbWriteTable(conn = con, rep_data, name = "rep_data")

# Query this base
tbl(src = con, "rep_data") %>% group_by(grp) %>% summarise(N = n())

# Error in .local(conn, statement, ...) : 
#   Unable to execute statement 'SELECT "grp", COUNT() AS "N"
# FROM "rep_data"
# GROUP BY "grp"
# LIMIT 10'.
# Server says 'syntax error, unexpected ')' in: "select "grp", count()" ' [#42000].


Do you think it can be easily fixed ? I'm currently using RSQLite, and MonetDBLite would be a nice improvement considering queries performances and window functions.

RCura commented 7 years ago

Further informations :

When running show_query, the problem appears :

tbl(src = con, "rep_data") %>% group_by(grp) %>% summarise(N = n()) %>% show_query()
SELECT "grp", COUNT() AS "N"
FROM "rep_data"
GROUP BY "grp"

Correcting this request by replacing COUNT() by COUNT(*) works as expected :

> dbGetQuery(con, 'select "grp", COUNT(*) AS "N" FROM rep_data GROUP BY grp')
  grp      N
1   A 499780
2   B 500220

And this should normally be handled by dplyr.R code (line 17), so, I don't understand :

sql_translate_env.MonetDBConnection <- function(con) {
    scalar = dbplyr::sql_translator(.parent = dbplyr::base_scalar,
      `!=` = dbplyr::sql_infix("<>")
    aggregate = dbplyr::sql_translator(.parent = dbplyr::base_agg, 
      n = function() dbplyr::sql("COUNT(*)"),
      sd =  dbplyr::sql_prefix("STDDEV_SAMP"),
      var = dbplyr::sql_prefix("VAR_SAMP"),
      median = dbplyr::sql_prefix("MEDIAN")
    ), #FIXME n_distinct
    window = dbplyr::sql_translator(.parent = dbplyr::base_win)
hannes commented 7 years ago

I don't understand this either. But recent updates to dplyr/dbplyr broke lots of stuff

hannes commented 7 years ago

I have just pushed commits that should fix the issue.