pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.9k stars 18.03k forks source link

[backport 2.3.x] String dtype: enable in SQL IO + resolve all xfails (#60255) #60315

Closed jorisvandenbossche closed 2 weeks ago

jorisvandenbossche commented 2 weeks ago

(cherry picked from commit ba4d1cfdda14bf521ff91d6ad432b21095c417fd)

Backport of https://github.com/pandas-dev/pandas/pull/60255

WillAyd commented 2 weeks ago

Thanks for opening. I'll take a look at the failures

WillAyd commented 2 weeks ago

As far as I can tell this issue might exist at a deeper level on the backport branch. On main, I see this behavior:

>>> import pandas as pd
>>> import pandas._libs.lib as lib
>>> import numpy as np

>>> arr = np.array(["a", "b", None])
>>> lib.maybe_convert_objects(arr, convert_to_nullable_dtype=True, convert_non_numeric=True)
<StringArray>
['a', 'b', <NA>]
Length: 3, dtype: string

that same call on the backport branch yields np.nan as the missing value sentinel:

>>> lib.maybe_convert_objects(arr, convert_to_nullable_dtype=True, convert_non_numeric=True)
<StringArrayNumpySemantics>
['a', 'b', nan]
Length: 3, dtype: str

that is unexpected right?

WillAyd commented 2 weeks ago

I think the backport branch was missing the change in https://github.com/pandas-dev/pandas/pull/59487 shown in the second commit here; adding that back in locally gets the tests to pass

Not sure where that got removed, but assuming during the setup of the backport branch accidentally

jorisvandenbossche commented 2 weeks ago

Thanks for figuring that out!

There is still something related to the datetime64 resolution, will take a look at that

WillAyd commented 2 weeks ago

I think the datetime issues were caused by a very subtle backport issue. Hopefully resolved by latest commit

WillAyd commented 2 weeks ago

Is there a way to restart the failed pre-commit job? I think that is spurious

jorisvandenbossche commented 2 weeks ago

Thanks for the fix!

Is there a way to restart the failed pre-commit job? I think that is spurious

No idea. I typically just ignore it if it times out