ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.3k stars 596 forks source link

bug: DuckDB table to pandas with dates #8925

Closed koaning closed 7 months ago

koaning commented 7 months ago

What happened?

I am working with a table of steam games that looks like this:

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ appid ┃ name                           ┃ release_date ┃ english ┃ developer        ┃ publisher ┃ platforms         ┃ required_age ┃ categories                                                                       ┃ genres ┃ steamspy_tags                ┃ achievements ┃ positive_ratings ┃ negative_ratings ┃ average_playtime ┃ median_playtime ┃ owners            ┃ price   ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ string                         │ date         │ int64   │ string           │ string    │ string            │ int64        │ string                                                                           │ string │ string                       │ int64        │ int64            │ int64            │ int64            │ int64           │ string            │ float64 │
├───────┼────────────────────────────────┼──────────────┼─────────┼──────────────────┼───────────┼───────────────────┼──────────────┼──────────────────────────────────────────────────────────────────────────────────┼────────┼──────────────────────────────┼──────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────┼───────────────────┼─────────┤
│    10 │ Counter-Strike                 │ 2000-11-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled     │ Action │ Action;FPS;Multiplayer       │            0 │           124534 │             3339 │            17612 │             317 │ 10000000-20000000 │    7.19 │
│    20 │ Team Fortress Classic          │ 1999-04-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled     │ Action │ Action;FPS;Multiplayer       │            0 │             3318 │              633 │              277 │              62 │ 5000000-10000000  │    3.99 │
│    30 │ Day of Defeat                  │ 2003-05-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Multi-player;Valve Anti-Cheat enabled                                            │ Action │ FPS;World War II;Multiplayer │            0 │             3416 │              398 │              187 │              34 │ 5000000-10000000  │    3.99 │
│    40 │ Deathmatch Classic             │ 2001-06-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled     │ Action │ Action;FPS;Multiplayer       │            0 │             1273 │              267 │              258 │             184 │ 5000000-10000000  │    3.99 │
│    50 │ Half-Life: Opposing Force      │ 1999-11-01   │       1 │ Gearbox Software │ Valve     │ windows;mac;linux │            0 │ Single-player;Multi-player;Valve Anti-Cheat enabled                              │ Action │ FPS;Action;Sci-fi            │            0 │             5250 │              288 │              624 │             415 │ 5000000-10000000  │    3.99 │
│    60 │ Ricochet                       │ 2000-11-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Multi-player;Online Multi-Player;Valve Anti-Cheat enabled                        │ Action │ Action;FPS;Multiplayer       │            0 │             2758 │              684 │              175 │              10 │ 5000000-10000000  │    3.99 │
│    70 │ Half-Life                      │ 1998-11-08   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Single-player;Multi-player;Online Multi-Player;Steam Cloud;Valve Anti-Cheat ena… │ Action │ FPS;Classic;Action           │            0 │            27755 │             1100 │             1300 │              83 │ 5000000-10000000  │    7.19 │
│    80 │ Counter-Strike: Condition Zero │ 2004-03-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Single-player;Multi-player;Valve Anti-Cheat enabled                              │ Action │ Action;FPS;Multiplayer       │            0 │            12120 │             1439 │              427 │              43 │ 10000000-20000000 │    7.19 │
│   130 │ Half-Life: Blue Shift          │ 2001-06-01   │       1 │ Gearbox Software │ Valve     │ windows;mac;linux │            0 │ Single-player                                                                    │ Action │ FPS;Action;Sci-fi            │            0 │             3822 │              420 │              361 │             205 │ 5000000-10000000  │    3.99 │
│   220 │ Half-Life 2                    │ 2004-11-16   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ Single-player;Steam Achievements;Steam Trading Cards;Captions available;Partial… │ Action │ FPS;Action;Sci-fi            │           33 │            67902 │             2419 │              691 │             402 │ 10000000-20000000 │    7.19 │
│     … │ …                              │ …            │       … │ …                │ …         │ …                 │            … │ …                                                                                │ …      │ …                            │            … │                … │                … │                … │               … │ …                 │       … │
└───────┴────────────────────────────────┴──────────────┴─────────┴──────────────────┴───────────┴───────────────────┴──────────────┴──────────────────────────────────────────────────────────────────────────────────┴────────┴──────────────────────────────┴──────────────┴──────────────────┴──────────────────┴──────────────────┴─────────────────┴───────────────────┴─────────┘

Note that I am reading a CSV here and that DuckDB correctly picks up the date column as a date, pandas does not do that automatically so it's nice to see. However, if I now do this:

t.to_pandas()

Then I get an error.

ValueError: Unexpected value for 'dtype': 'datetime64[D]'. Must be 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[ns]' or DatetimeTZDtype'.

This feels like a translation issue. In 2023 pandas may have stopped supporting this kind of datetime type.

What version of ibis are you using?

8.0.0

What backend(s) are you using, if any?

DuckDB

Relevant log output

Here's the full traceback.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[76], line 1
----> 1 t_duck.to_pandas()

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/expr/types/relations.py:3256, in Table.to_pandas(self, **kwargs)
   3248 def to_pandas(self, **kwargs) -> pd.DataFrame:
   3249     """Convert a table expression to a pandas DataFrame.
   3250 
   3251     Parameters
   (...)
   3254         Same as keyword arguments to [`execute`](./expression-generic.qmd#ibis.expr.types.core.Expr.execute)
   3255     """
-> 3256     return self.execute(**kwargs)

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/expr/types/core.py:324, in Expr.execute(self, limit, timecontext, params, **kwargs)
    297 def execute(
    298     self,
    299     limit: int | str | None = "default",
   (...)
    302     **kwargs: Any,
    303 ):
    304     """Execute an expression against its backend if one exists.
    305 
    306     Parameters
   (...)
    322         Keyword arguments
    323     """
--> 324     return self._find_backend(use_default=True).execute(
    325         self, limit=limit, timecontext=timecontext, params=params, **kwargs
    326     )

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/backends/base/sql/__init__.py:343, in BaseSQLBackend.execute(self, expr, params, limit, **kwargs)
    340 schema = expr.as_table().schema()
    342 with self._safe_raw_sql(sql, **kwargs) as cursor:
--> 343     result = self.fetch_from_cursor(cursor, schema)
    345 return expr.__pandas_result__(result)

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/backends/duckdb/__init__.py:1201, in Backend.fetch_from_cursor(self, cursor, schema)
   1183 table = cursor.cursor.fetch_arrow_table()
   1185 df = pd.DataFrame(
   1186     {
   1187         name: (
   (...)
   1199     }
   1200 )
-> 1201 df = PandasData.convert_table(df, schema)
   1202 if not df.empty and geospatial_supported:
   1203     return self._to_geodataframe(df, schema)

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/formats/pandas.py:118, in PandasData.convert_table(cls, df, schema)
    113     raise ValueError(
    114         "schema column count does not match input data column count"
    115     )
    117 for (name, series), dtype in zip(df.items(), schema.types):
--> 118     df[name] = cls.convert_column(series, dtype)
    120 # return data with the schema's columns which may be different than the
    121 # input columns
    122 df.columns = schema.names

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/formats/pandas.py:135, in PandasData.convert_column(cls, obj, dtype)
    132 method_name = f"convert_{dtype.__class__.__name__}"
    133 convert_method = getattr(cls, method_name, cls.convert_default)
--> 135 result = convert_method(obj, dtype, pandas_type)
    136 assert not isinstance(result, np.ndarray), f"{convert_method} -> {type(result)}"
    137 return result

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/formats/pandas.py:201, in PandasData.convert_Date(cls, s, dtype, pandas_type)
    199     s = s.dt.tz_convert("UTC").dt.tz_localize(None)
    200 try:
--> 201     return s.astype(pandas_type).dt.date
    202 except (TypeError, pd._libs.tslibs.OutOfBoundsDatetime):
    204     def try_date(v):

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/generic.py:6637, in NDFrame.astype(self, dtype, copy, errors)
   6631     results = [
   6632         ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
   6633     ]
   6635 else:
   6636     # else, only a single dtype is given
-> 6637     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6638     res = self._constructor_from_mgr(new_data, axes=new_data.axes)
   6639     return res.__finalize__(self, method="astype")

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/internals/managers.py:431, in BaseBlockManager.astype(self, dtype, copy, errors)
    428 elif using_copy_on_write():
    429     copy = False
--> 431 return self.apply(
    432     "astype",
    433     dtype=dtype,
    434     copy=copy,
    435     errors=errors,
    436     using_cow=using_copy_on_write(),
    437 )

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/internals/managers.py:364, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    362         applied = b.apply(f, **kwargs)
    363     else:
--> 364         applied = getattr(b, f)(**kwargs)
    365     result_blocks = extend_blocks(applied, result_blocks)
    367 out = type(self).from_blocks(result_blocks, self.axes)

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
    755         raise ValueError("Can not squeeze with more than one column.")
    756     values = values[0, :]  # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    760 new_values = maybe_coerce_values(new_values)
    762 refs = None

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors)
    234     dtype = dtype.numpy_dtype
    236 try:
--> 237     new_values = astype_array(values, dtype, copy=copy)
    238 except (ValueError, TypeError):
    239     # e.g. _astype_nansafe can fail on object-dtype of strings
    240     #  trying to convert to float
    241     if errors == "ignore":

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:182, in astype_array(values, dtype, copy)
    179     values = values.astype(dtype, copy=copy)
    181 else:
--> 182     values = _astype_nansafe(values, dtype, copy=copy)
    184 # in pandas we don't store numpy str dtypes, so convert to object
    185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:110, in _astype_nansafe(arr, dtype, copy, skipna)
    107 if lib.is_np_dtype(dtype, "M"):
    108     from pandas.core.arrays import DatetimeArray
--> 110     dta = DatetimeArray._from_sequence(arr, dtype=dtype)
    111     return dta._ndarray
    113 elif lib.is_np_dtype(dtype, "m"):

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:327, in DatetimeArray._from_sequence(cls, scalars, dtype, copy)
    325 @classmethod
    326 def _from_sequence(cls, scalars, *, dtype=None, copy: bool = False):
--> 327     return cls._from_sequence_not_strict(scalars, dtype=dtype, copy=copy)

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:354, in DatetimeArray._from_sequence_not_strict(cls, data, dtype, copy, tz, freq, dayfirst, yearfirst, ambiguous)
    351 else:
    352     tz = timezones.maybe_get_tz(tz)
--> 354 dtype = _validate_dt64_dtype(dtype)
    355 # if dtype has an embedded tz, capture it
    356 tz = _validate_tz_from_dtype(dtype, tz, explicit_tz_none)

File ~/Development/probabl/venv/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:2550, in _validate_dt64_dtype(dtype)
   2544     raise ValueError(msg)
   2546 if (
   2547     isinstance(dtype, np.dtype)
   2548     and (dtype.kind != "M" or not is_supported_dtype(dtype))
   2549 ) or not isinstance(dtype, (np.dtype, DatetimeTZDtype)):
-> 2550     raise ValueError(
   2551         f"Unexpected value for 'dtype': '{dtype}'. "
   2552         "Must be 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', "
   2553         "'datetime64[ns]' or DatetimeTZDtype'."
   2554     )
   2556 if getattr(dtype, "tz", None):
   2557     # https://github.com/pandas-dev/pandas/issues/18595
   2558     # Ensure that we have a standard timezone for pytz objects.
   2559     # Without this, things like adding an array of timedeltas and
   2560     # a  tz-aware Timestamp (with a tz specific to its datetime) will
   2561     # be incorrect(ish?) for the array as a whole
   2562     dtype = cast(DatetimeTZDtype, dtype)

ValueError: Unexpected value for 'dtype': 'datetime64[D]'. Must be 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[ns]' or DatetimeTZDtype'.


### Code of Conduct

- [X] I agree to follow this project's Code of Conduct
lostmygithubaccount commented 7 months ago

hi @koaning, thanks for reporting! any chance the CSV file is public or something you could share to make reproduction a bit easier?

lostmygithubaccount commented 7 months ago

I was able to reproduce this on 8.0.0, but not on main -- we will be releasing 9.0.0 fairly soon, but upgrading from one of the prereleases (pip install --pre ibis-framework[duckdb]) or directly from main (pip install git+https://github.com/ibis-project/ibis) might fix this for you

koaning commented 7 months ago

It's this dataset but I'll try to whip up something more reproducible in a bit. I just learned my shiny new expensive keyboard arrived at the depot and I gotta pick that up first 😅

lostmygithubaccount commented 7 months ago

8.0.0:

IPython session ```python [ins] In [1]: import ibis [ins] In [2]: ibis.options.interactive = True [ins] In [3]: t = ibis.read_csv("steam/steam.csv") [ins] In [4]: t Out[4]: ┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━┓ ┃ appid ┃ name ┃ release_date ┃ english ┃ developer ┃ publisher ┃ platforms ┃ required_age ┃ … ┃ ┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━┩ │ int64 │ string │ date │ int64 │ string │ string │ string │ int64 │ … │ ├───────┼────────────────────────────────┼──────────────┼─────────┼──────────────────┼───────────┼───────────────────┼──────────────┼───┤ │ 10 │ Counter-Strike │ 2000-11-01 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 20 │ Team Fortress Classic │ 1999-04-01 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 30 │ Day of Defeat │ 2003-05-01 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 40 │ Deathmatch Classic │ 2001-06-01 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 50 │ Half-Life: Opposing Force │ 1999-11-01 │ 1 │ Gearbox Software │ Valve │ windows;mac;linux │ 0 │ … │ │ 60 │ Ricochet │ 2000-11-01 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 70 │ Half-Life │ 1998-11-08 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 80 │ Counter-Strike: Condition Zero │ 2004-03-01 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ 130 │ Half-Life: Blue Shift │ 2001-06-01 │ 1 │ Gearbox Software │ Valve │ windows;mac;linux │ 0 │ … │ │ 220 │ Half-Life 2 │ 2004-11-16 │ 1 │ Valve │ Valve │ windows;mac;linux │ 0 │ … │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └───────┴────────────────────────────────┴──────────────┴─────────┴──────────────────┴───────────┴───────────────────┴──────────────┴───┘ [ins] In [5]: t.to_pandas() --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[5], line 1 ----> 1 t.to_pandas() File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/expr/types/relations.py:3256, in Table.to_pandas(self, **kwargs) 3248 def to_pandas(self, **kwargs) -> pd.DataFrame: 3249 """Convert a table expression to a pandas DataFrame. 3250 3251 Parameters (...) 3254 Same as keyword arguments to [`execute`](./expression-generic.qmd#ibis.expr.types.core.Expr.execute) 3255 """ -> 3256 return self.execute(**kwargs) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/expr/types/core.py:324, in Expr.execute(self, limit, timecontext, params, **kwargs) 297 def execute( 298 self, 299 limit: int | str | None = "default", (...) 302 **kwargs: Any, 303 ): 304 """Execute an expression against its backend if one exists. 305 306 Parameters (...) 322 Keyword arguments 323 """ --> 324 return self._find_backend(use_default=True).execute( 325 self, limit=limit, timecontext=timecontext, params=params, **kwargs 326 ) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/backends/base/sql/__init__.py:343, in BaseSQLBackend.execute(self, expr, params, limit, **kwargs) 340 schema = expr.as_table().schema() 342 with self._safe_raw_sql(sql, **kwargs) as cursor: --> 343 result = self.fetch_from_cursor(cursor, schema) 345 return expr.__pandas_result__(result) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/backends/duckdb/__init__.py:1201, in Backend.fetch_from_cursor(self, cursor, schema) 1183 table = cursor.cursor.fetch_arrow_table() 1185 df = pd.DataFrame( 1186 { 1187 name: ( (...) 1199 } 1200 ) -> 1201 df = PandasData.convert_table(df, schema) 1202 if not df.empty and geospatial_supported: 1203 return self._to_geodataframe(df, schema) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/formats/pandas.py:118, in PandasData.convert_table(cls, df, schema) 113 raise ValueError( 114 "schema column count does not match input data column count" 115 ) 117 for (name, series), dtype in zip(df.items(), schema.types): --> 118 df[name] = cls.convert_column(series, dtype) 120 # return data with the schema's columns which may be different than the 121 # input columns 122 df.columns = schema.names File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/formats/pandas.py:135, in PandasData.convert_column(cls, obj, dtype) 132 method_name = f"convert_{dtype.__class__.__name__}" 133 convert_method = getattr(cls, method_name, cls.convert_default) --> 135 result = convert_method(obj, dtype, pandas_type) 136 assert not isinstance(result, np.ndarray), f"{convert_method} -> {type(result)}" 137 return result File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/ibis/formats/pandas.py:201, in PandasData.convert_Date(cls, s, dtype, pandas_type) 199 s = s.dt.tz_convert("UTC").dt.tz_localize(None) 200 try: --> 201 return s.astype(pandas_type).dt.date 202 except (TypeError, pd._libs.tslibs.OutOfBoundsDatetime): 204 def try_date(v): File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/generic.py:6640, in NDFrame.astype(self, dtype, copy, errors) 6634 results = [ 6635 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items() 6636 ] 6638 else: 6639 # else, only a single dtype is given -> 6640 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 6641 res = self._constructor_from_mgr(new_data, axes=new_data.axes) 6642 return res.__finalize__(self, method="astype") File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/internals/managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors) 427 elif using_copy_on_write(): 428 copy = False --> 430 return self.apply( 431 "astype", 432 dtype=dtype, 433 copy=copy, 434 errors=errors, 435 using_cow=using_copy_on_write(), 436 ) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs) 361 applied = b.apply(f, **kwargs) 362 else: --> 363 applied = getattr(b, f)(**kwargs) 364 result_blocks = extend_blocks(applied, result_blocks) 366 out = type(self).from_blocks(result_blocks, self.axes) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze) 755 raise ValueError("Can not squeeze with more than one column.") 756 values = values[0, :] # type: ignore[call-overload] --> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 760 new_values = maybe_coerce_values(new_values) 762 refs = None File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:237, in astype_array_safe(values, dtype, copy, errors) 234 dtype = dtype.numpy_dtype 236 try: --> 237 new_values = astype_array(values, dtype, copy=copy) 238 except (ValueError, TypeError): 239 # e.g. _astype_nansafe can fail on object-dtype of strings 240 # trying to convert to float 241 if errors == "ignore": File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:182, in astype_array(values, dtype, copy) 179 values = values.astype(dtype, copy=copy) 181 else: --> 182 values = _astype_nansafe(values, dtype, copy=copy) 184 # in pandas we don't store numpy str dtypes, so convert to object 185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str): File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:110, in _astype_nansafe(arr, dtype, copy, skipna) 107 if lib.is_np_dtype(dtype, "M"): 108 from pandas.core.arrays import DatetimeArray --> 110 dta = DatetimeArray._from_sequence(arr, dtype=dtype) 111 return dta._ndarray 113 elif lib.is_np_dtype(dtype, "m"): File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:327, in DatetimeArray._from_sequence(cls, scalars, dtype, copy) 325 @classmethod 326 def _from_sequence(cls, scalars, *, dtype=None, copy: bool = False): --> 327 return cls._from_sequence_not_strict(scalars, dtype=dtype, copy=copy) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:354, in DatetimeArray._from_sequence_not_strict(cls, data, dtype, copy, tz, freq, dayfirst, yearfirst, ambiguous) 351 else: 352 tz = timezones.maybe_get_tz(tz) --> 354 dtype = _validate_dt64_dtype(dtype) 355 # if dtype has an embedded tz, capture it 356 tz = _validate_tz_from_dtype(dtype, tz, explicit_tz_none) File ~/repos/ibis/temp/venv/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:2550, in _validate_dt64_dtype(dtype) 2544 raise ValueError(msg) 2546 if ( 2547 isinstance(dtype, np.dtype) 2548 and (dtype.kind != "M" or not is_supported_dtype(dtype)) 2549 ) or not isinstance(dtype, (np.dtype, DatetimeTZDtype)): -> 2550 raise ValueError( 2551 f"Unexpected value for 'dtype': '{dtype}'. " 2552 "Must be 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', " 2553 "'datetime64[ns]' or DatetimeTZDtype'." 2554 ) 2556 if getattr(dtype, "tz", None): 2557 # https://github.com/pandas-dev/pandas/issues/18595 2558 # Ensure that we have a standard timezone for pytz objects. 2559 # Without this, things like adding an array of timedeltas and 2560 # a tz-aware Timestamp (with a tz specific to its datetime) will 2561 # be incorrect(ish?) for the array as a whole 2562 dtype = cast(DatetimeTZDtype, dtype) ValueError: Unexpected value for 'dtype': 'datetime64[D]'. Must be 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[ns]' or DatetimeTZDtype'. ```

main:

[ins] In [1]: import ibis

[ins] In [2]: ibis.options.interactive = True

[ins] In [3]: t = ibis.read_csv("temp/steam/steam.csv")

[ins] In [4]: t
Out[4]:
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━┓
┃ appid ┃ name                           ┃ release_date ┃ english ┃ developer        ┃ publisher ┃ platforms         ┃ required_age ┃ … ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━┩
│ int64 │ string                         │ date         │ int64   │ string           │ string    │ string            │ int64        │ … │
├───────┼────────────────────────────────┼──────────────┼─────────┼──────────────────┼───────────┼───────────────────┼──────────────┼───┤
│    10 │ Counter-Strike                 │ 2000-11-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│    20 │ Team Fortress Classic          │ 1999-04-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│    30 │ Day of Defeat                  │ 2003-05-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│    40 │ Deathmatch Classic             │ 2001-06-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│    50 │ Half-Life: Opposing Force      │ 1999-11-01   │       1 │ Gearbox Software │ Valve     │ windows;mac;linux │            0 │ … │
│    60 │ Ricochet                       │ 2000-11-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│    70 │ Half-Life                      │ 1998-11-08   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│    80 │ Counter-Strike: Condition Zero │ 2004-03-01   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│   130 │ Half-Life: Blue Shift          │ 2001-06-01   │       1 │ Gearbox Software │ Valve     │ windows;mac;linux │            0 │ … │
│   220 │ Half-Life 2                    │ 2004-11-16   │       1 │ Valve            │ Valve     │ windows;mac;linux │            0 │ … │
│     … │ …                              │ …            │       … │ …                │ …         │ …                 │            … │ … │
└───────┴────────────────────────────────┴──────────────┴─────────┴──────────────────┴───────────┴───────────────────┴──────────────┴───┘

[ins] In [5]: t.to_pandas()
Out[5]:
         appid                        name release_date  english  ... average_playtime median_playtime             owners  price
0           10              Counter-Strike   2000-11-01        1  ...            17612             317  10000000-20000000   7.19
1           20       Team Fortress Classic   1999-04-01        1  ...              277              62   5000000-10000000   3.99
2           30               Day of Defeat   2003-05-01        1  ...              187              34   5000000-10000000   3.99
3           40          Deathmatch Classic   2001-06-01        1  ...              258             184   5000000-10000000   3.99
4           50   Half-Life: Opposing Force   1999-11-01        1  ...              624             415   5000000-10000000   3.99
...        ...                         ...          ...      ...  ...              ...             ...                ...    ...
27070  1065230             Room of Pandora   2019-04-24        1  ...                0               0            0-20000   2.09
27071  1065570                   Cyber Gun   2019-04-23        1  ...                0               0            0-20000   1.69
27072  1065650            Super Star Blast   2019-04-24        1  ...                0               0            0-20000   3.99
27073  1066700  New Yankee 7: Deer Hunters   2019-04-17        1  ...                0               0            0-20000   5.19
27074  1069460                   Rune Lord   2019-04-24        1  ...                0               0            0-20000   5.19

[27075 rows x 18 columns]

there was a very large refactor of the internals after 8 was released, and it seems like this specific issue (along with many others) was fixed

koaning commented 7 months ago

Given that somebody beat me to confirming it, I feel confident this can be closed. Thanks all!