pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.54k stars 17.89k forks source link

BUG: `Series.map` to tuple values fails for category dtype #41669

Open mwaskom opened 3 years ago

mwaskom commented 3 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
s = pd.Series(["a", "a", "b"]).astype("category")
s.map({"a": (1, 2), "b": (3, 4)})

Problem description

NotImplementedError: isna is not defined for MultiIndex
Full traceback ```python-traceback --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) in 1 import pandas as pd 2 s = pd.Series(["a", "a", "b"]).astype("category") ----> 3 s.map({"a": (1, 2), "b": (3, 4)}) ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/series.py in map(self, arg, na_action) 3907 dtype: object 3908 """ -> 3909 new_values = super()._map_values(arg, na_action=na_action) 3910 return self._constructor(new_values, index=self.index).__finalize__( 3911 self, method="map" ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action) 901 # "Union[ExtensionArray, Any]" has no attribute "map" 902 # [union-attr] --> 903 return self._values.map(mapper) # type: ignore[union-attr] 904 905 values = self._values ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/arrays/categorical.py in map(self, mapper) 1196 new_categories = self.categories.map(mapper) 1197 try: -> 1198 return self.from_codes( 1199 self._codes.copy(), categories=new_categories, ordered=self.ordered 1200 ) ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/arrays/categorical.py in from_codes(cls, codes, categories, ordered, dtype) 567 Categories (2, object): ['a' < 'b'] 568 """ --> 569 dtype = CategoricalDtype._from_values_or_dtype( 570 categories=categories, ordered=ordered, dtype=dtype 571 ) ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py in _from_values_or_dtype(cls, values, categories, ordered, dtype) 271 # Note: This could potentially have categories=None and 272 # ordered=None. --> 273 dtype = CategoricalDtype(categories, ordered) 274 275 return dtype ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py in __init__(self, categories, ordered) 158 159 def __init__(self, categories=None, ordered: Ordered = False): --> 160 self._finalize(categories, ordered, fastpath=False) 161 162 @classmethod ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py in _finalize(self, categories, ordered, fastpath) 312 313 if categories is not None: --> 314 categories = self.validate_categories(categories, fastpath=fastpath) 315 316 self._categories = categories ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py in validate_categories(categories, fastpath) 505 if not fastpath: 506 --> 507 if categories.hasnans: 508 raise ValueError("Categorical categories cannot be null") 509 pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__() ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/indexes/base.py in hasnans(self) 2193 """ 2194 if self._can_hold_na: -> 2195 return bool(self._isnan.any()) 2196 else: 2197 return False pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__() ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/indexes/base.py in _isnan(self) 2172 """ 2173 if self._can_hold_na: -> 2174 return isna(self) 2175 else: 2176 # shouldn't reach to this condition by checking hasnans beforehand ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/dtypes/missing.py in isna(obj) 125 Name: 1, dtype: bool 126 """ --> 127 return _isna(obj) 128 129 ~/miniconda3/envs/seaborn-py38-latest/lib/python3.8/site-packages/pandas/core/dtypes/missing.py in _isna(obj, inf_as_na) 154 # hack (for now) because MI registers as ndarray 155 elif isinstance(obj, ABCMultiIndex): --> 156 raise NotImplementedError("isna is not defined for MultiIndex") 157 elif isinstance(obj, type): 158 return False NotImplementedError: isna is not defined for MultiIndex ```

Expected Output

With an object-typed Series:

import pandas as pd
s = pd.Series(["a", "a", "b"])
s.map({"a": (1, 2), "b": (3, 4)})
0    (1, 2)
1    (1, 2)
2    (3, 4)
dtype: object

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.5.final.0 python-bits : 64 OS : Darwin OS-release : 19.6.0 Version : Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.2.4 numpy : 1.20.2 pytz : 2020.1 dateutil : 2.8.1 pip : 20.3.1 setuptools : 49.6.0.post20210108 Cython : None pytest : 5.4.3 hypothesis : None sphinx : 3.3.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.15.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None ```
ghost commented 3 years ago

Try this instead:

import pandas as pd
s = pd.Series(["a", "a", "b"]).map({"a": (1, 2), "b": (3, 4)}).astype('category')

Output:

0    (1, 2)
1    (1, 2)
2    (3, 4)
dtype: category
Categories (2, object): [(1, 2), (3, 4)]
mwaskom commented 3 years ago

@sanskarbtw thanks but that is not helpful when operating on an input that (may be) categorical