Open Ezibenroc opened 3 months ago
Thanks for the issue.
It seems like we're generating incorrect code (technically sqlglot is).
It also seems like it's not possible to tell Trino not to flatten structs, which is incredibly annoying because it means sqlglot can't do the UNNEST
rewrite it's doing without precise column information.
I really don't want to introduce a new API here just to allow Trino to do everything the other UNNEST
-supporting backends support. Instead, I'd like to try and work with the various upstream projects to see if we can't get this addressed in some other way.
First, I'm going to explore whether giving sqlglot more information helps with it's transformation and then report a feature request/bug upstream if that doesn't help.
I just found this issue. I was not aware of this syntax, but apparently the struct flattening can be done without knowing the names of the struct fields
I am not sure if it helps, since the output would still be different from other backends: with trino you would have one column per struct field, while with other backends you would have a single column containing the structs.
I am not sure if it helps, since the output would still be different from other backends: with trino you would have one column per struct field, while with other backends you would have a single column containing the structs.
I think this is probably workable, but there's another issue that would defautl
Seems like the additional column added by WITH ORDINALITY
doesn't get picked up when not specifying columns 😮💨
BUT there's a wretched workaround:
select *
from array_of_structs t0
cross join unnest(
t0.my_nested_column,
transform(
sequence(0, cardinality(t0.my_nested_column) - 1),
x -> cast(row(x) as row(idx int))
)
) things
which produces
some_col | my_nested_column | foo | bar | idx
----------+--------------------------------------+-----+-----+-----
1 | [{foo=10, bar=20}, {foo=11, bar=21}] | 10 | 20 | 0
1 | [{foo=10, bar=20}, {foo=11, bar=21}] | 11 | 21 | 1
2 | [{foo=12, bar=22}] | 12 | 22 | 0
(3 rows)
on the trino CLI.
I will get the discussion started over on the sqlglot side later today or tomorrow.
The sequence
function is also kneecapped at 10k elements so there's really no free lunch here.
Is your feature request related to a problem?
Disclaimer: I am quite new to ibis, so perhaps I missed something.
Let's say I have a table with a column that contains arrays of struct. For example, a struct with two fields:
I have this SQL query that works well on my trino DB to unnest this column:
I am trying to do the same with ibis with this code:
Unfortunately, I get the following exception:
My guess is that it is because the code produced by the trino backend from ibis does not list the fields of the struct. Calling
ibis.to_sql(tmp)
produces this query:I found this related SO question, writing explicitly all the fields of the struct seems to be needed.
Note that I tried a similar code on another column that has simple lists of integers instead of lists of structs and it works well, so the struct seem to be the issue.
What is the motivation behind your request?
No response
Describe the solution you'd like
I guess a possible solution would be for the user to give the list of the fields in the unnest function. For instance:
But I am not sure what the other back-ends should do when this argument is provided. Perhaps just ignore it?
What version of ibis are you running?
9.3.0
What backend(s) are you using, if any?
Trino
Code of Conduct