Open jbrockmendel opened 11 months ago
In my mind, _from_sequence
already is the "constructor for flavor-preserving inference".
I understand there are multiple use cases, but that can be served by a single method depending on whether a dtype is passed or not? That feels quite clear to me: when a dtype
is passed, this is honored, and otherwise the dtype is inferred from the data (with the constraint of that it has to be a dtype supported by the calling class).
The main examples that come to mind are maybe_cast_pointwise_result, MaskedArray._maybe_mask_result.
In MaskedArray._maybe_mask_result
, we actually don't use _from_sequence
, but the main Array class constructors (but also without specifying a dtype)
In MaskedArray._maybe_mask_result, we actually don't use _from_sequence, but the main Array class constructors (but also without specifying a dtype)
Correct. My point is that MaskedArray subclasses use a different pattern to achieve the same result. The datetimelike EAs have their own special-casing. If it is feasible (which im not ready to claim), then it would be preferable to have a single shared pattern for these.
I understand there are multiple use cases, but that can be served by a single method depending on whether a dtype is passed or not?
Certainly possible. On the margin I'd prefer the cases where we intentionally want dtype inference to be more explicit. I'm spending some time this week tracking down just where those cases are.
I've spent some time tracking down the places where we don't pass a dtype to from_sequence:
other
in e.g.
assert dtype is not None
; looks like both in DatetimeArray._from_sequence.Also tracking down the various patterns we use for flavor-preserving-partial-inference:
Other places where we have special-casing for Masked/Arrow dtypes related to flavor-retention:
I expect there are more that I have missed, will update here as I find them.
In my mind, _from_sequence already is the "constructor for flavor-preserving inference".
Re-reading, I think I missed an important point: a big part of the relevant use case is having a BooleanArray method that returns a FloatingArray/IntegerArray etc. (this example could also be addressed by condensing these classes down to just MaskedArray). xref #58258
TLDR: we should make
dtype
required in EA._from_sequence and implement a new EA constructor for flavor-preserving inference.ATM dtype is not required in EA._from_sequence. The behavior-- and more importantly the usage-- when it is not specified is not standardized. In many cases it does some kind of inference, but how much inference varies.
Most of the places where we don't pass a dtype are aimed at some type of dtype-flavor-retention. e.g. we did some type of operation starting with a pyarrow/masked/sparse dtype and we want the result.dtype to still be pyarrow/masked/sparse, but not necessarily the same exact dtype. The main examples that come to mind are maybe_cast_pointwise_result, MaskedArray._maybe_mask_result.
The main other place where we call _from_sequence without a dtype is pd.array. With a little bit of effort I'm pretty sure we can start passing dtypes there.
cc @jorisvandenbossche