Open alexowens90 opened 1 month ago
Update:
assert (s == rhs).iloc[0]
passes.
Worth noting that 1.401298464324817071e-45
is the smallest value representable by a 32 bit float, i.e. 4 times the value from my example
When we do Series - scalar
ops, it appears we default to the Series dtype.
ser = pd.Series(np.float32(0))
scalar = np.float64(3.503246160812043 * (10**-46))
print((ser - scalar).dtype)
# float32
print((scalar - ser).dtype)
# float32
Given this, it seems consistent to coerce ser < scalar
to the Series dtype as well.
There is some inconsistency with integers:
ser = pd.Series(np.uint8(0))
scalar = np.int8(1)
print((ser - scalar).dtype)
# uint8
print((scalar - ser).dtype)
# uint8
print(ser - scalar)
# 0 255
# dtype: uint8
scalar = np.int8(-1)
print(ser + scalar)
# 0 -1
# dtype: int16
It's not clear to me if this is inconsistent or a special rule for dealing with negative integers. What would a proposal be?
Yes my assumption was that there was an if scalar < 0
condition somewhere in the type promotion rules.
It is odd to me that
pd.Series(np.uint8(0)) - np.int8(1) != pd.Series(np.uint8(0)) + np.int8(-1)
It would make more sense if both gave the same result (that of the RHS).
For context, I work on similar processing operations in ArcticDB, and I was using hypothesis+Pandas to declaratively test ArcticDB when I noticed this.
No disagreement that this is thorny - as mentioned above I think we would need a concrete proposal on how to handle promotion across operations and dtypes to move forward.
Fair. As the ArcticDB code I linked shows, you can do this manually for the standard numeric types for a small number of supported binary operations, but you need to work out this logic for every operation you support, which I assume would be a bit of a mammoth task with Pandas.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Comparing the two floating point values (of different widths) using
<
correctly returnsTrue
.Placing one of the floating point values into a Pandas Series and then running the same comparison incorrectly returns
False
. i.e. the second assertion fails.Note that the installed versions below use numpy 1.26.4. The issue is not reproducible with numpy 2.X
Expected Behavior
Both assertions should pass
Installed Versions
Released version:
Dev version: