Open randolf-scholz opened 5 months ago
Thanks for the report. This does indeed appear to me to be an issue, but I wonder if this is wide-spread throughout pandas and what the ramifications of trying to fix this systematically would be. E.g.
from pandas._libs import lib
print(lib.is_scalar(np.array(0)))
# False
Further investigations are welcome!
Lib.itemfromzerodim
Edit [rhshadrach]: lib.item_from_zerodim
I think there are two ways to handle it:
Regarding the latter, any element of a 1-dimensional vector space can be considered a scalar, since in this case the vector space and its base field are isomorphic. Towards this end, numpy
, and many other libraries, offer the .item()
function, which returns a scalar if the array contains exactly one element (although it doesn't seem part of the python Array API currently).
pandas._libs.lib.is_scalar
seems to be in line here with numpy.isscalar
, which also returns false for np.array(0)
, as technically, this is considered a 0-dimensional array and hence not a scalar.
If (1) is preferred by the maintainers, this issue can probably be closed. However, numpy.clip
does support passing 0-dimensional arrays, and so does Series.where
, which can be used to implement Series.clip
:
import numpy as np
import pandas as pd
s = pd.Series([-1,2,3])
s_clipped = s.where(s>np.array(0), np.array(0))
pd.testing.assert_series_equal(s_clipped, s.clip(lower=0)) # ✅
Whether one wants to go with option ① or ② is probably just a matter of taste/design, but using this choice consistently throughout the API seems desirable.
but using this choice consistently throughout the API seems desirable.
Right - I'm not sure how well this is supported throughout pandas. You mentioned clip
, but there are a number of other methods that take scalars like this I think. It seem to me the next steps are to determine which methods support this, and from that we can find a reasonable way to achieve consistency.
take
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Results in
TypeError: len() of unsized object
.Issue Description
The following line tries to compute
len(other)
, but scalar arrays have nolen
.https://github.com/pandas-dev/pandas/blob/c46fb76afaf98153b9eef97fc9bbe9077229e7cd/pandas/core/series.py#L5892-L5894
If we remove these two lines, the above example produces the expected result, and still errors as expected if e.g. a list of incorrect size is passed.
Expected Behavior
Scalar arrays should be treated like scalars.
Installed Versions