pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.24k stars 17.79k forks source link

categorical needs a searchsorted implmentation #8420

Closed jankatins closed 9 years ago

jankatins commented 9 years ago

Currently the Categorical.searchsorted() raises NotImplementedError.

jreback commented 9 years ago

someting like:

def searchsorted(self, value, side='left'):
     if not self.ordered:
         raise ValueError("searchsorted requires an ordered Categorical")
     index = self.values.searchsorted(value, side=side)
     return index

?

jankatins commented 9 years ago

Nope, along this lines

if not self.ordered:
    raise ValueError("searchsorted requires an ordered Categorical")
values_as_codes =_get_codes_for_values(values, self.categories) 
# or ... = self.categories.get_indexer(value)
index = np.searchsorted(self.codes, values_as_codes, side=side)
return index
stevesimmons commented 9 years ago

I have a fix for this in pull request #8928.

    >>> x = pd.Categorical(['apple', 'bread', 'bread', 'cheese', 'milk', 'donuts' ])
    >>> x.searchsorted(['bread', 'eggs'], side='right', sorter=[0, 1, 2, 3, 5, 4])
    # array([3, 5]) # eggs after donuts, after switching milk and donuts

    values_as_codes = self.categories.values.searchsorted(Series(v).values, side)
    indices = self.codes.searchsorted(values_as_codes, sorter=sorter)
    return indices
jreback commented 9 years ago

closed by #8972