hannes / MonetDBLite

MonetDB reconfigured as a library
108 stars 11 forks source link

can't use AVG as window function #174

Closed shearerpmm closed 7 years ago

shearerpmm commented 7 years ago
library(MonetDBLite)
library(DBI)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dbdir <- 'data/cache'
drv <- MonetDBLite::MonetDBLite()
con <- dbConnect(drv, dbdir)
dbWriteTable(con, "mtcars", mtcars, overwrite=T)
mt <- tbl(con, "mtcars")
dbGetQuery(con, "SELECT cyl, AVG(mpg) FROM mtcars GROUP BY cyl" )
#>   cyl       L3
#> 1   6 19.74286
#> 2   4 26.66364
#> 3   8 15.10000
mt %>% group_by(vs) %>% mutate(mean(mpg))
#> Error in .local(conn, statement, ...): Unable to execute statement 'SELECT "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb", avg("mpg") OVER...'.
#> Server says 'ParseException:SQLparser:SELECT: function 'avg' not found'.
mt %>% group_by(vs) %>% mutate(mean(mpg)) %>% show_query()
#> <SQL>
#> SELECT "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb", avg("mpg") OVER (PARTITION BY "vs") AS "mean(mpg)"
#> FROM "mtcars"
dbDisconnect(con,shutdown=TRUE)
hannes commented 7 years ago

More details please, whats the output of sessionInfo()?

shearerpmm commented 7 years ago
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.0       DBI_0.7           MonetDBLite_0.4.0

loaded via a namespace (and not attached):
 [1] compiler_3.4.0   magrittr_1.5     assertthat_0.2.0 R6_2.2.0         tools_3.4.0      glue_1.0.0      
 [7] tibble_1.3.3     Rcpp_0.12.10     codetools_0.2-15 digest_0.6.12    rlang_0.1.1      dbplyr_1.1.0    
shearerpmm commented 7 years ago

Here's a bit more useful reprex - it appears the average window function is not implemented or dplyr is invoking the wrong one (probably the latter)

library(MonetDBLite)
library(DBI)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dbdir <- 'data/cache'
drv <- MonetDBLite::MonetDBLite()
con <- dbConnect(drv, dbdir)
dbWriteTable(con, "mtcars", mtcars, overwrite=T)
mt <- tbl(con, "mtcars")

# AVG works as an aggregate...
dbGetQuery(con, "SELECT cyl, AVG(mpg) FROM mtcars GROUP BY cyl" )
#>   cyl       L3
#> 1   6 19.74286
#> 2   4 26.66364
#> 3   8 15.10000

# ...but not as a window
mt %>% group_by(vs) %>% mutate(mean(mpg))
#> Error in .local(conn, statement, ...): Unable to execute statement 'SELECT "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb", avg("mpg") OVER...'.
#> Server says 'ParseException:SQLparser:SELECT: function 'avg' not found'.

mt %>% group_by(vs) %>% mutate(mean(mpg)) %>% show_query()
#> <SQL>
#> SELECT "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb", avg("mpg") OVER (PARTITION BY "vs") AS "mean(mpg)"
#> FROM "mtcars"

# ... window query doesn't work in SQL either
dbGetQuery(con, 'SELECT "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb", avg("mpg") OVER (PARTITION BY "vs") AS "mean(mpg)" FROM "mtcars"')
#> Error in .local(conn, statement, ...): Unable to execute statement 'SELECT "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb", avg("mpg") OVER...'.
#> Server says 'ParseException:SQLparser:SELECT: function 'avg' not found'.

dbDisconnect(con,shutdown=TRUE)
shearerpmm commented 7 years ago

What window functions does MonetDB implement? Is it just the ranking functions listed here? https://www.monetdb.org/Documentation/Manuals/SQLreference/WindowFunctions

This email chain suggests AVG and other stat functions are not implemented as window functions https://www.monetdb.org/pipermail/users-list/2015-October/008484.html

I guess this is probably a "closed won't fix" type issue, but I will leave it open in case someone has a comment about future prospects of implementing these window functions (which I personally use quite a bit).

shearerpmm commented 7 years ago

My initial reprex was misleading. The first case (group_by + summarize) works because it's a simple aggregate. The second (group_by + mutate) requires the AVG window function to be implemented. (group_by + mutate can also be implemented using a join to an aggregate subquery, but the window method is more modern.)

hannes commented 7 years ago

Yes the support for window functions in MonetDB[Lite] is very limited. I have extended the dplyr SQL translator to throw errors in those cases.