ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.31k stars 598 forks source link

bug: collect with order_by arg not working as expected in BigQuery backend #10499

Open efcaguab opened 2 days ago

efcaguab commented 2 days ago

What happened?

When I try to create an array per group and I want the order the array by another column, I get an error indicating that there is something wrong with the generated SQL.

The same operations work well if I don't need to keep the array ordered, when write the SQL myself, or with other backends (e.g duckdb).

import ibis
from ibis import _
import pandas as pd

bq_dataset = "scratch_fer"
data = pd.DataFrame({
    "foo": [1, 2, 3, 4, 5, 6],
    "bar": ["a", "b", "a", "a", "b", "b"],
    "order": [1, 2, 3, 4, 5, 6],
})
bq = ibis.bigquery.connect()
test_table = bq.create_table("test", data, database=bq_dataset, overwrite=True)

# ↓ Works 
test_table.group_by("bar").agg(foo_array=_.foo.collect()).to_pandas()
test_table.alias("t").sql("SELECT bar, ARRAY_AGG(foo ORDER BY order) AS foo_array FROM t GROUP BY bar").to_pandas()

# ↓ Doesn't work
test_table.group_by("bar").agg(foo_array=_.foo.collect(order_by=_.order)).to_pandas()

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

BadRequest: 400 NULLS LAST not supported with ascending sort order in aggregate functions.; reason: invalidQuery, location: query, message: NULLS LAST not supported with ascending sort order in aggregate functions.

Code of Conduct

cpcloud commented 2 days ago

Hm, this looks like it might be a SQLGlot issue with generating default NULLS LAST syntax.