pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.9k stars 18.03k forks source link

BUG: combine_first reorders columns #60427

Open davetapley opened 2 days ago

davetapley commented 2 days ago

Pandas version checks

Reproducible Example

import pandas as pd

df = pd.DataFrame({"B": [1, 2, 3], "A": [4, 5, 6]}, index=["a", "b", "c"])

print(df)  # B first, then A
print()

df_ = pd.DataFrame({"A": [7]}, index=["b"])

print(df.combine_first(df_))  # A first, then B
print()

print(df_.combine_first(df))  # A first, then B
print()

print(df_.combine_first(df)[df.columns])  # Workaround

Issue Description

I wouldn't expect combine_first to reorder the columns alphabetically, but it does.

Bug might be a stretch, but it's certainly unexpected and awkward.

Expected Behavior

Preserve the column order, as show in # Workaround.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.13.0+ python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Wed Oct 4 23:55:28 PDT 2023; root:xnu-8020.240.18.704.15~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 2.1.3 pytz : 2024.2 dateutil : 2.9.0.post0 pip : 24.2 Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None
U-S-jun commented 1 day ago

take