tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
469 stars 169 forks source link

Unable to use quantile() in mutate() in DuckDB #1487

Open blairqin opened 3 months ago

blairqin commented 3 months ago

I'm trying to calculate the 25th quantile value in DuckDB, using the mutate() function from dbplyr. I got an error message saying "ORDER BY is not implemented for window functions!". However, DuckDB's documentation indicates support for the quantile function via quantile_count(), saying "All aggregate functions can be used in a windowing context" (DuckDB Documentation on Window Functions).

Here's the reprex:

Here's the reprex:

library(tidyverse)
library(DBI)
library(dbplyr)

con <- DBI::dbConnect(duckdb::duckdb())

copy_to(con, mtcars)

tbl(con, "mtcars") |> 
  mutate(mpg_25th = quantile(mpg, 0.25, na.rm = TRUE))
# Shows Error in `collect()`:
# ! Failed to collect lazy table.
# Caused by error:
# ! {"exception_type":"Parser","exception_message":"ORDER BY is not implemented for window functions!"}
# Run `rlang::last_trace()` to see where the error occurred.

dbGetQuery(
  con,
  'SELECT *, quantile_cont(mpg, 0.25) OVER () AS mpg_25th FROM mtcars;'
)
# Works fine

Would you please to fix this issue? Thank you.

hadley commented 3 months ago

This is the SQL that we're generating for duckdb here:

SELECT
  mtcars.*,
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY mpg) OVER () AS mpg_25th
FROM mtcars
blairqin commented 3 months ago

Thank you. I still get the following error running the SQL you recommended.

Error: {"exception_type":"Parser","exception_message":"ORDER BY is not implemented for window functions!"}

Here is the code I use.

library(tidyverse)
library(DBI)
library(dbplyr)

con <- DBI::dbConnect(duckdb::duckdb())

copy_to(con, mtcars)

dbGetQuery(
  con,
  'SELECT
  mtcars.*,
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY mpg) OVER () AS mpg_25th
  FROM mtcars;'
)
# Error: {"exception_type":"Parser","exception_message":"ORDER BY is not implemented for window functions!"}

Could this be related to an issue from Duckdb?

hadley commented 3 months ago

I was just including the SQL that dbplyr generates to complete your reprex.

blairqin commented 2 months ago

I recently found that the summarise function does not encounter the error above using the following code.

library(tidyverse)
library(DBI)
library(dbplyr)

con <- DBI::dbConnect(duckdb::duckdb())

copy_to(con, mtcars)

tbl(con, "mtcars") |> 
  summarize(mpg_25th = quantile(mpg, 0.25, na.rm = TRUE))

The corresponding SQL code is

SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY mpg) AS mpg_25th
FROM mtcars

As a temporary workaround, I could use the original dataframe to perform a left join with the summarised dataframe to get the same results.