Open greg-offerfit opened 6 hours ago
TO REPRODUCE:
Download the below json file and save it somewhere accessible by Python/Ibis prompt. The json file is just a single row that generates a GBQ table with the appropriate schema and values
Run the below code (or equivalent to you):
import ibis
con = ibis.bigquery.connect() # INSERT ANY PROJECT/DATASET ARGS YOU NEED HERE
table = con.read_json(PATH_TO_DOWNLOADED_JSON_FILE)
dedupe = table.distinct(on=["id"]).execute() # ERRORS HERE
The generated query is:
SELECT `t1`.`int_ids`, `t1`.`id` FROM (SELECT `t0`.`id`, ARRAY_AGG(`t0`.`int_ids` IGNORE NULLS LIMIT 1)[safe_offset(0)] AS `int_ids` FROM `TABLE` AS `t0` GROUP BY 1) AS `t1`
Which GBQ marks as invalid with the following error:
The argument to ARRAY_AGG must not be an array type but was ARRAY<INT64> at [1:58]
What happened?
Given a GBQ table with the following schema:
Calling
table.distinct(on=["id"]).execute()
fails with the following error:google.api_core.exceptions.BadRequest: 400 The argument to ARRAY_AGG must not be an array type but was ARRAY at [7:5]; reason: invalidQuery, location: query, message: The argument to ARRAY_AGG must not be an array type but was ARRAY at [7:5]
I expected this to generate a valid query and return a DataFrame.
See below comment for minimal reproduction.
What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
BigQuery
Relevant log output
Code of Conduct