Closed YerdnY closed 1 month ago
take
take
I found that this seems to be an issue with floats when passing an array of floats to index(): these are the bins in the function qcut, 0.0 1.0 0.1 19.0 0.2 37.0 0.3 55.0 0.4 73.0 0.5 91.0 0.6 109.0 0.7 127.0 0.8 145.0 0.9 163.0 1.0 181.0
However, when you convert it as Index(bins) in the argument of _bins_to_cut, the values are: Index([1.0, 19.0, 37.0, 55.00000000000001, 73.0, 91.0, 109.00000000000001, 126.99999999999999, 145.0, 163.0, 181.0] and the issue is at 126.9999999. Any suggestions for how to resolve this
https://github.com/pandas-dev/pandas/blob/bfe5be01fef4eaecf4ab033e74139b0a3cac4a39/pandas/core/reshape/tile.py#L339-L341 It is caused by the floating point of np.linspace in qcut.
quantiles = np.linspace(0, 1, 11)
with np.printoptions(precision=20):
print(quantiles)
Note that the output is actually:
[0. 0.1 0.2
0.30000000000000004 0.4 0.5
0.6000000000000001 0.7000000000000001 0.8
0.9 1. ]
Thanks to @rob-sil, it seems this issue has been solved in #59409 @mroeschke
Pandas version checks
[x] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The code above produces the following output: val 0 19 7 19 1 18 2 18 3 18 4 18 5 18 8 18 9 18 6 17 Name: count, dtype: int64 The issue is there are three unique counts of items in bins - 17, 18, 19. I expect no more then two unique counts. Ideally one, but that is only possible if input size is divisible by nbins.
Expected Behavior
The same code produces this, correct output in pandas 2.1.4:
val 0 19 1 18 2 18 3 18 4 18 5 18 6 18 7 18 8 18 9 18 Name: count, dtype: int64
Installed Versions