pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.91k stars 18.03k forks source link

BUG: Incorrect logical operation between pandas dataframe and series #60204

Open jialuoo opened 3 weeks ago

jialuoo commented 3 weeks ago

Pandas version checks

Reproducible Example

Here is an example:

import pandas as pd
df = pd.DataFrame({
    'A': [5, 15, 10, 8],
    'B': [20, 3, 7, 12]   
})
result = (df >= 10) | (df['A'] >= 10)
result

The output:

       A      B      0      1      2      3
0  False   True  False  False  False  False
1   True  False  False  False  False  False
2   True  False  False  False  False  False
3  False   True  False  False  False  False

Issue Description

  1. I would expect the results in column 1 and column 2 to be True since it's an | operation between dataframe and series.
  2. Could you please direct me to the appropriate user manual? I couldn't locate the one that explains the logical operations between a pandas DataFrame and a Series.

Thanks a lot!

Expected Behavior

I would expect the results in column 1 and column 2 to be True since it's an | operation between dataframe and series.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.10.15 python-bits : 64 OS : Linux OS-release : 6.9.10-1rodete5-amd64 Version : #1 SMP PREEMPT_DYNAMIC Debian 6.9.10-1rodete5 (2024-09-04) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 2.1.1 pytz : 2024.2 dateutil : 2.9.0.post0 pip : 24.2 Cython : None sphinx : None IPython : 8.28.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.9.0 html5lib : None hypothesis : None gcsfs : 2024.9.0post1 jinja2 : 3.1.4 lxml.etree : None matplotlib : 3.9.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : 0.24.0 psycopg2 : None pymysql : None pyarrow : 17.0.0 pyreadstat : None pytest : 8.3.3 python-calamine : None pyxlsb : None s3fs : None scipy : 1.14.1 sqlalchemy : 2.0.36 tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None
ojmel commented 3 weeks ago

What do you hope to use the corrected table for?

jialuoo commented 3 weeks ago

Thanks for the quick response.

I just want to clarify whether this behavior is expected or if it might be a bug. What are the rules for logical operations between a Pandas DataFrame and a Series (e.g., dataframe | dataframe, dataframe | series, etc.)? Is there any user manual or documentation that explains the rules for logical operations between a Pandas DataFrame and a Series?

At the moment, I don't have a specific goal in mind. I noticed this behavior while experimenting with the DataFrame.where method, which seems to allow these logical operations as conditions. For example: df.where((df >= 10) | (df['A'] >= 10)). So, the results of the logical operation will directly affect the dataframe.where results.

Lavishgangwani commented 2 weeks ago

take