Open jbrockmendel opened 5 years ago
One example is for nested data. In this case we need something like scalar_for_dtype(value, dtype)
, since the ndim of a "scalar" for a nested data type would be > 0.
Alternative for registering, could be a method on the dtype/array that can check if a value is a valid scalar?
Hi! I think I've run into this issue in my own attempt at building an ExtensionArray and I was curious if there'd been any changes on this or if it was something I could potentially contribute on.
I've been working on an extension array where the na_value I want to return for the ExtensionDtype is not recognized as a scalar by is_scalar. That seems to cause issues with some methods that aren't part of the ExtensionArray interface that I can't figure out how to fix (e.g. Series.where).
Is there another workaround for this that I haven't found yet? Thanks!
Is there another workaround for this that I haven't found yet?
Only thought that comes to mind is trying to replace is_scalar
checks with not is_listlike
checks. Last time I checked (worth double-checking since this was a while ago) is_listlike was faster than is_scalar anyway, and should be more robust to this problem.
Thanks very much! It looks like that change has already been made in a number of places in the most recent versions of Pandas (I was testing on 1.3).
Thanks for your help and sorry to bother you!
Now that is_list_like interprets scalars correctly, https://github.com/pandas-dev/pandas/pull/44626, this is now the main issue holding back pint-pandas.
There's a few different ways suggested in this issue since it was created. What's the suggested way to fix this at the moment?
edit: I was able to get all tests in pint-pandas passing without this, so it may not be needed.
I looked at this in April and writing up my conclusions fell through the cracks.
Many of the places where we use is_scalar (also is_list_like) are either
1) as a preliminary check if we can use this as a scalar in __setitem__
2) to see whether we should treat it as a single label vs sequence of labels for indexing.
In the latter case, is_scalar is behaving like a faster is_hashable
(58ns vs 506ns on []
).
In the former, we should be able to use an EA-specific method to check if the item is a scalar that is valid for the specific array at hand. We already have something like this for most of our internal EAS (DTA, TDA, PeriodArray, Categorical, PandasArray, IntervalArray, and MaskedArray all have _validate_setitem_value. ArrowExtensionArray has _maybe_convert_setitem_value).
https://github.com/pandas-dev/pandas/pull/27461#discussion_r305168936
Before we move on this, I think we need to clarify in which situations we care about
lib.is_scalar(x)
vs the simplernp.ndim(x) == 0