pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.56k stars 17.56k forks source link

BUG: inconsistent behavior of pandas.api.types.pandas_dtype #58643

Open aaravind100 opened 1 month ago

aaravind100 commented 1 month ago

Pandas version checks

Reproducible Example

import pandas as pd

print(pd.api.types.pandas_dtype(pd.ArrowDtype(pyarrow.string())))  # <- correct
# output: string[pyarrow]
print(pd.api.types.pandas_dtype("string"), pd.api.types.pandas_dtype("string[pyarrow]"))  # <- incorrect
# output: string string

Issue Description

pd.api.types.pandas_dtype behavior is inconsistent when called for the str string[pyarrow] which returns type string instead of type string[pyarrow], compared to other types which returns type <type>[pyarrow].

Expected Behavior

pd.api.types.pandas_dtype to return type string[pyarrow] for input str string[pyarrow]

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.9.final.0 python-bits : 64 OS : Linux OS-release : 6.8.0-31-generic Version : #31-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.5.1 pip : 24.0 Cython : 3.0.10 pytest : 8.1.1 hypothesis : 6.100.1 sphinx : 7.3.4 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.23.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.0 sqlalchemy : 2.0.29 tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
rajat315315 commented 1 month ago

It seems this code stops it from inferring correct dtype.

aaravind100 commented 1 month ago

I was wondering if this is for backwards compatibility as per this comment.

https://github.com/unionai-oss/pandera/pull/1628#issuecomment-2101100675