pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.74k stars 17.95k forks source link

BUG: df.loc astype type change on specific row not working #59732

Open ganbaaelmer opened 2 months ago

ganbaaelmer commented 2 months ago

Pandas version checks

Reproducible Example

df.loc[index, 'column_name'].astype(str)

Issue Description

astype(str) not working

Expected Behavior

object to str not working. dtype still object

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.12.4.final.0 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 151 Stepping 5, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.5.1 pip : 24.0 Cython : None pytest : 7.4.4 hypothesis : None sphinx : 7.3.7 blosc : None feather : None xlsxwriter : None lxml.etree : 5.2.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.25.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : 1.3.7 dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : 3.8.4 numba : 0.59.1 numexpr : 2.8.7 odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 14.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : 2024.3.1 scipy : 1.13.1 sqlalchemy : 2.0.30 tables : 3.9.2 tabulate : 0.9.0 xarray : 2023.6.0 xlrd : None zstandard : 0.22.0 tzdata : 2023.3 qtpy : 2.4.1 pyqt5 : None
rhshadrach commented 2 months ago

Thanks for the report. Currently str uses NumPy object dtypes. But this is an area of pandas that is under active development and will change in the future. See https://pandas.pydata.org/pdeps/0014-string-dtype.html for more details. You can enable it by using pd.options.future.infer_string = True:

pd.options.future.infer_string = True
ser = pd.Series(["a", "b", None])
print(ser.astype("str").dtype)
# str

@jorisvandenbossche - I'm seeing the following:

pd.options.future.infer_string = True
ser = pd.Series(["a", "b", None])
print(ser.astype(str).dtype)
# object

Is that expected to be object dtype?

ganbaaelmer commented 2 months ago

i think this one is working

pd.options.future.infer_string = True
df[column_A] = df[column_A].astype('str')