Open jitingxu1 opened 5 months ago
Thanks for opening this. Issue 1 should be fixed in #8535.
Issue 2 is due to the to_pyarrow
conversion path in sqlite
(and a few other backends) going dbapi row -> pandas -> pyarrow
. When returning pandas dataframes from to_pandas
we currently map a UUID column to an object
dtype series of uuid.UUID
objects (and these objects fail when converting to pyarrow). In contrast, for to_pyarrow
we return a string column with the same data.
The easiest (and I think most consistent) fix would be to stop returning uuid columns in to_pandas
as uuid.UUID
values and instead treat them as strings. This matches what we do for both polars and pyarrow outputs. It's also more efficient for the user since they don't have an object
dtype series in the output series.
cc @cpcloud for a :+1: / :-1: before I implement this fix.
Eventually we should simplify the pandas output until .to_pyarrow().to_pandas()
to offload all the conversion duties to arrow. So it is a +1 from me.
Seems fine. I don't like that we have to do this but the alternative of implementing a custom pyarrow type seems less desirable than converting to strings.
The repeated UUID issue has been addressed:
In [4]: import ibis
...: ibis.options.interactive = True
...: from ibis.expr.api import row_number, uuid, now, pi
...:
...: ibis.set_backend("sqlite")
...: t = ibis.examples.penguins.fetch()
...: t.mutate(uuid=ibis.uuid()).to_pandas()
Out[4]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year uuid
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007 f3102c7e-167c-4854-af20-d3729580e2cc
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007 86afae37-f0e5-48d7-ba13-aa701374d4cd
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007 665cbe36-d7b7-4a7e-bd5f-ebf0c043c72f
3 Adelie Torgersen NaN NaN NaN NaN None 2007 a740b304-0a13-4f89-bdb4-fa9475f2daa4
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007 a8263a30-9cbb-4175-94e9-1429a6fdb0fa
.. ... ... ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 19.8 207.0 4000.0 male 2009 112e7fcc-bf14-4177-96ee-526d9343c368
340 Chinstrap Dream 43.5 18.1 202.0 3400.0 female 2009 c5964727-4dc0-42dd-8039-1527bd37b673
341 Chinstrap Dream 49.6 18.2 193.0 3775.0 male 2009 a3d1a137-5847-4309-90c4-59d0f8fe35f9
342 Chinstrap Dream 50.8 19.0 210.0 4100.0 male 2009 33da8b0b-d368-442c-ba05-44daa037b1e0
343 Chinstrap Dream 50.2 18.7 198.0 3775.0 female 2009 de5b1031-de4a-4c13-85a6-920e4741f922
What happened?
Issue 1: duckdb will produce different uuid for each row, but same uuid generated by sqlite, there maybe other backends have the same issue.
Issue 2: get ArrowTypeError when show data:
Got the following error:
it works well for to_pandas()
What version of ibis are you using?
8.0.0
What backend(s) are you using, if any?
duckdb, sqlite
Relevant log output
No response
Code of Conduct