has2k1 / plydata

A grammar for data manipulation in Python
https://plydata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
275 stars 11 forks source link

Missed dplyr::mapvalues() #32

Open stettberger opened 1 year ago

stettberger commented 1 year ago

I was fiddling around with plydata, and I wanted to map some non-categorial data columns. For plotine, I often have the problem that I have to rename values to get proper labels. With dplyr, there is dplyr::mapvalues(), which is was missing in plydata. My quick hack for that looks like this:

def mapvalues(column, keys=None, values=None, na_action=None, **kwargs):
    translate_dict = None
    if keys is not None:
        assert values is not None and len(keys) == len(values)
        translate_dict = dict(zip(keys, values))
    else:
        assert kwargs is not None
        translate_dict = kwargs
    def mapper(df):
        return df[column].map(translate_dict, na_action=na_action)
    return mapper

Usage:

>> plydata.data.fishdata >> define(Station_X=mapvalues('Station', ['Release', 'MAW'], ['Foobar', "Barfoo"]))
    TagID  Station  value Station_X
0    4842  Release      1    Foobar
1    4843  Release      1    Foobar
2    4844  Release      1    Foobar
3    4845  Release      1    Foobar
4    4847  Release      1    Foobar
..    ...      ...    ...       ...
204  4861      MAW      1    Barfoo
205  4862      MAW      0    Barfoo
206  4863      MAW      0    Barfoo
207  4864      MAW      0    Barfoo
208  4865      MAW      0    Barfoo

[209 rows x 4 columns]

If there is a better way or a package that already provides this, just let me know. Otherwise, I think it would be a good addition to plydata.