tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
478 stars 174 forks source link

Add `.order` and `.frame` arguments to `mutate()`. #1542

Open iangow opened 2 months ago

iangow commented 2 months ago

Back in 2017, @Hadley suggested "there are two possible APIs" for implementing what became window_frame() and window_order() (see tidyverse/dplyr#2874; @edgararuiz-zz).

At the time, I believe there was no .by argument to mutate(), so the window_frame()/window_order() approach seemed to make most sense. At that time one choice was:

df %>%
  group_by(gvkey) %>%
  window(
    .order = vars(datadate),
    .frame = (-3, 0),

    sale_ttm = sum(sale),
    cogs_ttm = sum(cogs),
    sga_ttm = sum(sga)
  )

But now this could be something like:

df |>
  mutate(
    sale_ttm = sum(sale),
    cogs_ttm = sum(cogs),
    sga_ttm = sum(sga),
    .by = gvkey,
    .order = vars(datadate),
    .frame = (-3, 0)
  )

This would seem to have the merit of making it easier for dbplyr to infer that a window function was being sought (currently there are cases where dbplyr does not get the hint).

I am surprised that I have only one instance of window_frame() in my book. It seems like a very handy pattern (e.g., moving averages, windowed regressions).

I had a comical exchange with ChatGPT about this this afternoon (Australia time) (see here).

DavisVaughan commented 2 months ago

I think if this was going to be considered then it would only be an argument to the dbplyr data frame method, so I'm going to move it there and let them decide. I don't think it is super useful for the general case because we already have other means of doing windowed evaluations per column, like with {slider}.

hadley commented 2 months ago

Yeah, this syntax looks pretty reasonable to me now.

iangow commented 3 days ago

Yeah, this syntax looks pretty reasonable to me now.

I figure that this would also be useful in duckplyr as it would allow users of that package to access window-function functionality if implemented in the relevant version of mutate().