BUG: - Githubissues

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

a = pd.DataFrame({"A": ["toto", "tata", "tutu"]}, dtype="category")

print("pd.concat with a single df:")
print(pd.concat([a]).dtypes)  # Categoy

print("pd.concat with two identical df:")
print(pd.concat([a, a.copy()]).dtypes)  # Categoy

print("pd.concat with two df containing the same values of the category:")
b = pd.DataFrame({"A": ["tata", "tutu"]}, dtype="category")
print(pd.concat([a, b]).dtypes)  # Object

print("pd.concat with two df containing different values of the category:")
c = pd.DataFrame({"A": ["titi"]}, dtype="category")
print(pd.concat([a, c]).dtypes)  # Object

Issue Description

When concatenating DataFrames including categorical columns, the dtype of the column in the new DataFrame is inconsistent:

When concatenating a single DataFrame, the output column is categorical
When concatenating a single DataFrame with a copy of itself, the output column is categorical
When concatenating two DataFrames, the output is not categorical, but object

Not sure if this is a bug, or if it is by design / if I am missing something here.

Expected Behavior

Concatenating two DataFrame including a similar categorical column:

import pandas as pd

a = pd.DataFrame({"A": ["toto", "tata", "tutu"]}, dtype="category")
b = pd.DataFrame({"A": ["tata", "tutu"]}, dtype="category")
print(pd.concat([a, b]).dtypes)

Should output a categorical column:

A    category
dtype: object

Alternative solution: Concatenating a single DataFrame

import pandas as pd

a = pd.DataFrame({"A": ["toto", "tata", "tutu"]}, dtype="category")
print(pd.concat([a]).dtypes)

Should output an object dtype column, to be consistent with the "real" concatenation.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 8d16504de035280a93fac8cd62040fcfb6e87dea python : 3.10.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.utf8 pandas : 0+untagged.29862.g8d16504 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 61.2.0 pip : 22.1.2 Cython : 0.29.32 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

pandas-dev / pandas

BUG: #47920

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions