pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.23k stars 17.79k forks source link

DISC: Make all user-facing dtypes ExtensionDtype #53778

Open jbrockmendel opened 1 year ago

jbrockmendel commented 1 year ago

A lot of code could be simplified internally if we were always working with EAs instead of having to check for EA-vs-ndarray. In particular I'm thinking of Block.values, ArrayManager.arrays, Series._values, Index._data, Index._values, maybe frame._values.

The other big upside is that essentially all numpy-specific logic could eventually be migrated into PandasArray methods.

We could make that change without changing the .dtype/.dtypes properties, but on the margin I think the inconsistency that would introduce would be a footgun.

This might be a hassle for users who rely on obj.dtype/dtypes being np.dtype objects, or on the ndarray attributes listed above being ndarrays.

If we go down this road I think it'd be important to rename PandasDtype->NumpyExtensionDtype or something like it xref #53694. Maybe even rename ExtensionDtype->Dtype.

xref #24662 about implementing a dtype for tznaive dt64 and td64. xref #40021 which would be "fixed" bc the monkeypatching in the PandasArray tests would no longer be necessary. xref #24877 would be much more appealing if we didn't need to unwrap PandasArray anytime we see it.

BTW I'm not "+1" on this ATM, just thinking about it.

jbrockmendel commented 1 year ago

cc @jorisvandenbossche