tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

frame clause error in greenplum #290

Closed mwillumz closed 10 years ago

mwillumz commented 10 years ago

I understand greenplum isn't strictly supported, however, this error makes me wonder if there might be a slightly cleaner approach to window functions that need not be ordered.

When running:

  Data%.%
  group_by(variable) %.%
  mutate(avg=mean(value))

dplyr inserts a frame clause of "BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING". This isn't necessary and, for greenplum anyway (and surprised not postgresql), generates an error message asking for an order by clause to be supplied.

hadley commented 10 years ago

Can you point me to the greenplum docs on window functions?

mwillumz commented 10 years ago

Absolutely! Page 117.

http://bitcast-a.v1.o1.sjc1.bitgravity.com/greenplum/Greenplum_CE_Database/documentation/4.2.2/greenplum_database_4.2_administrator_guide.pdf

I verified that the dplyr generated sql does work if I either insert an "ORDER BY X" clause (but I'm not sure how to do this in dplyr) or drop the window frame clause entirely.

hadley commented 10 years ago

And ORDER BY doesn't do anything right? I.e. regardless of what variable is ordered by you get the same results?

mwillumz commented 10 years ago

right

mwillumz commented 10 years ago

I've found another different behavior in greenplum (again with window functions). I need to set parens=FALSE in the sql_vector()'s of the order and partition SQL generators.

Can you point me to a resource that provides a bit of guidance on writing a complimentary package? Assuming there's a better way to manage this than requiring that a dplyr_greenplum package with modified functions is loaded subsequent to dplyr. Right now I'm forking and pulling in a modified version via devtools.

Eager to fully leverage the package and contribute what little I can.

hadley commented 10 years ago

I think probably the best way to proceed is to make a greenplum source that can be customised where it conflicts with postgresql.

hadley commented 10 years ago

Discussion moved to #336