ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.29k stars 596 forks source link

bug: window functions not working as expected in BigQuery backend #10390

Open efcaguab opened 2 weeks ago

efcaguab commented 2 weeks ago

What happened?

When I try to use nunique with a grouup_by, or window I'm getting an error suggesting that the SQL is not allowed by BigQuery.

The same ibis expressions seem to work well using duckdb. Perhaps something to do with the row-wise arrangement of the window function?

import ibis
from ibis import _
import pandas as pd

bq_dataset = "scratch_fer"
data = pd.DataFrame({
    "foo": [1, 1, 2, 2, 3, 3],
    "bar": ["a", "b", "a", "a", "b", "b"]
})
bq = ibis.bigquery.connect()
bq.create_table("test", data, database=bq_dataset, overwrite=True)
test_table = bq.table(f"{bq_dataset}.test")

# All these commands below fail: (400 Window framing clause is not allowed if
# DISTINCT is specified at [3:60]; reason: invalidQuery, location: query,
# message: Window framing clause is not allowed if DISTINCT is specified
test_table.group_by("foo").mutate(bar=_.bar.nunique()).to_pandas()
test_table.filter(_.bar.nunique().over(ibis.window(group_by="foo")) > 1).to_pandas()

# However these (which I think will be the equivalent of above) work
test_table.sql(f"SELECT foo, COUNT(DISTINCT bar) OVER (PARTITION BY foo) AS bar FROM {bq_dataset}.test").to_pandas()
test_table.sql(f"SELECT * FROM {bq_dataset}.test QUALIFY COUNT(DISTINCT bar) OVER (PARTITION BY foo) > 1").to_pandas()

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

bigquery

Relevant log output

(400 Window framing clause is not allowed if
# DISTINCT is specified at [3:60]; reason: invalidQuery, location: query,
# message: Window framing clause is not allowed if DISTINCT is specified

Code of Conduct

cpcloud commented 1 week ago

This looks like an issue with what we're generating as the default window frame.

Thanks for the report!

cpcloud commented 1 day ago

I've started on this, but it's hairy and probably requires changing the behavior of window functions with Ibis default values.

I should be able to get to this for 10.0 which would be the ideal release to ship it in since it would be a big change.