pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.26k stars 17.8k forks source link

ExtensionDtype.construct_array_type is not optional #24860

Closed jorisvandenbossche closed 3 years ago

jorisvandenbossche commented 5 years ago

Currently, the ExtensionDtype docstring says:

Optionally one can override construct_array_type for construction
with the name of this dtype via the Registry. See
:meth:`pandas.api.extensions.register_extension_dtype`.

* construct_array_type

However, it seems that in practice this is used for more than just supporting string dtypes. Eg I also need to implement it to have groupby working.

Eg cyberpandas does not yet implement it, and there df.groupby('ipaddresses').size() fails with a NotImplementedError.

I assume we are fine with requiring this (assuming a 1-1 mapping of dtype and array), and a doc update is enough?

jreback commented 5 years ago

yeah this should probably be a required element of an EA Dtype

TomAugspurger commented 5 years ago

Yeah, OK to require it.

On Mon, Jan 21, 2019 at 8:19 AM Joris Van den Bossche < notifications@github.com> wrote:

Currently, the ExtensionDtype docstring says:

Optionally one can override construct_array_type for construction with the name of this dtype via the Registry. See :meth:pandas.api.extensions.register_extension_dtype.

  • construct_array_type

However, it seems that in practice this is used for more than just supporting string dtypes. Eg I also need to implement it to have groupby working.

Eg cyberpandas does not yet implement it, and there df.groupby('ipaddresses').size() fails with a NotImplementedError.

I assume we are fine with requiring this (assuming a 1-1 mapping of dtype and array), and a doc update is enough?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/24860, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIhOz-O2u_Dzc1kl1Y3PoA_tpxEdsks5vFcxngaJpZM4aK6pH .

jbrockmendel commented 5 years ago

I see the usefulness here, but there is a non-trivial downside that this prevents core.dtypes from being self-contained (w/r/t the rest of core).

jbrockmendel commented 3 years ago

I'm now fully on board with this. Is it just a matter of updating the docstring?

jorisvandenbossche commented 3 years ago

And I suppose changing the implementation from NotImplementedError to AbstractMethodError ?