pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.94k stars 18.04k forks source link

BUG: String methods has no method "isascii()" #59091

Open ujex256 opened 5 months ago

ujex256 commented 5 months ago

Pandas version checks

Reproducible Example

import pandas as pd
series = pd.Series(["a", "b", "c", "あ", ""])

series.str.isalnum()
"""
0    True
1    True
2    True
3    True
4    False
dtype: bool
"""

series.str.isascii()
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# AttributeError: 'StringMethods' object has no attribute 'isascii'

Issue Description

pd.Series.str does not support isascii() metod.

Expected Behavior

The code shown above would look like this.

series.str.isascii()
"""
0    True
1    True
2    True
3    False
4    True
dtype: bool
"""

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 151 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : ja_JP.UTF-8 LOCALE : Japanese_Japan.932 pandas : 2.2.2 numpy : 2.0.0 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.5.1 pip : 24.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 5.2.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : None IPython : 8.25.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
krishsharma0413 commented 5 months ago

Can a contributor look into this and if needed I will work on the PR :P unless @ujex256 or someone else is already working on it.

ujex256 commented 5 months ago

@krishsharma0413 Thank you for your help! I was already trying to get the PR out. I appreciate your concern.

Siddharth-Latthe-07 commented 5 months ago

@ujex256 The pd.Series.str accessor in pandas does not have an isascii method by default. However, you can achieve the desired functionality by using a custom function combined with the apply method.

import pandas as pd

def is_ascii(s):
    return all(ord(c) < 128 for c in s)

series = pd.Series(["a", "b", "c", "あ", ""])

series_isascii = series.apply(is_ascii)
print(series_isascii)
"""
0     True
1     True
2     True
3    False
4     True
dtype: bool
"""

let me know, if it works Thanks

yuanx749 commented 4 months ago

I think this is more of a Feature Request rather than a bug.

A temporary workaround could be series.apply(lambda x: x.isascii()).