Open MacroBull opened 3 years ago
This is an edge case.
There are far more practical uses for an Indexer with Boolean values than trying to index True/False
indexed DataFrame. I am not convinced the codebase warrants a check for this kind - possibly if it is very simple and doesn't interefere with a lot of logic.
A workaround for a dynamic workflow is:
pd.Series([True, True, False]).value_counts().sort_index().iloc[0]
There are far more practical uses for an Indexer with Boolean values than trying to index
True/False
indexed DataFrame.
hmm. this does make the result inconsistent with say pandas.Series([1, 2, 3])[[0]]
A workflow could potentially have the index generated from unknown data. e.g as in the OP, from value_counts
I think we should mark as a bug, pending furher investigation on how feasible it is is distinguish between a fancy indexer and a boolean indexer.
Hi! I am new to open source and wanted to contribute. I wanted to know how to proceed and whether this issue is open to a PR?
Thanks @Rudransh24. Probably best to look for an issue labelled good first issue. https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22
This one probably requires more discussion due to potential ambiguity that could arise.
I am aware of this edge case. As far as I figured out, a list consisted only of True
and False
s works as boolean values(not index).
In this case, I would do
x = pandas.Series([True, True, False]).value_counts()
x[ x.index == False]
or you can slip in some value other than True
or False
pandas.Series([True, True, False]).value_counts().append(pandas.Series([0], index = ['None']), verify_integrity=False)[[True, 'None']]
I think there should be something like .ibool
for boolean indexing for completeness and consistency even though there is little chance that things like this happen.
I am from R. And from the designing persepective, python including pandas has so many exceptions. For example, even importing
a module has some egde case like that import time
never imports time.py
. I think it's poor design problem
Yes, I encountered this issue when I was working on some generic code, where I can not expect what input type will accurately be.
The workaround currently I'm using is manually invoking get_indexer
combined with iloc
, so bool indexer is not considered.
locs = df.index.get_indexer(index)
values = df.iloc[locs]
Which is almost the same way the direct index method do.
The workarounds provided by @kwhkim, either force bool indexing or append None-index, I think, could be performance-impacting in some way.
In my opinion, when an ambiguous index given, the direct __getitem__
should give a warning about the potential bool indexing or even raise an exception.
[x] I have checked that this issue has not already been reported.
[ ] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
It seems that
Series.__getitem__
andcheck_bool_indexer
do not work well for bool-typed indexExpected Output
Output of
pd.show_versions()