pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.55k stars 17.9k forks source link

Bitwise operations have inconsistent behavior, different from numpy #23191

Open cyrusmaher opened 6 years ago

cyrusmaher commented 6 years ago

Code Sample

# succeeds
pd.Series([False]) & pd.Series([6.])

# Example: order matters
# fails: ufunc 'bitwise_and' not supported for the input types
pd.Series([6.]) & pd.Series([False])

# Example: behavior is different from numpy
# fails: ufunc 'bitwise_and' not supported for the input types
np.array([False]) & np.array([6.])

Problem description

Bitwise operations between floats and bools error out in numpy. They error out in pandas too if the first argument is a float, but not if the first argument is a bool.

Expected Output

ufunc 'bitwise_and' not supported for the input types

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.4 pytest: None pip: 18.0 setuptools: 36.5.0.post20170921 Cython: 0.28.2 numpy: 1.15.2 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: None patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.2 openpyxl: None xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: 0.1.5 fastparquet: None pandas_gbq: None pandas_datareader: None
mfenner1 commented 4 years ago

In a similar vein:

>>> df = pd.DataFrame({'likes_hockey' : [True, True, False],
...                    'likes_soccer' : [True, False, False]})

>>> df.likes_hockey & df.likes_soccer # works for series
0     True
1    False
2    False
dtype: bool

>>> # df.bitwise_and(df.likes_soccer, axis='rows') # desirable, but not implemented

>>> df.mul(df.likes_soccer, axis='rows') # implemented, but complains and recommends '&'
/Users/mfenner/anaconda3/lib/python3.7/site-packages/pandas/core/computation/expressions.py:178: 
UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the 
bool dtype, use '&' instead
f"evaluating in Python space because the {repr(op_str)} "

   likes_hockey  likes_soccer
0          True          True
1         False         False
2         False         False

>>> # but neither of these broadcast/align appropriately (which is expected):
>>> # df & df.likes_soccer  # fails
>>> # df.likes_soccer & df  # fails

Much of this might apply to np.logical_and as well (and the other bitwise_ and logical_ operators/functions).