hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 41 forks source link

Support for PintArray-preserving .map() function? #161

Closed MichaelTiemannOSC closed 11 months ago

MichaelTiemannOSC commented 1 year ago

Gnostic sources of pandas ExtensionArrays mention that implementing .map() might be useful: https://github.com/pandas-dev/pandas/issues/23179

I suggest it might also be useful for PintArrays. When we have a column of a dataframe that's np.float64 type and we map a function across that, we get back a series that's np.float64. But when we have a column that wraps a PintArray and we map a function across that that preserves the Quantity type, we get back a series that's dtype='object'.

import numpy as np
import pandas as pd
import pint
import pint_pandas

from pint import Quantity as Q_
from pint_pandas import PintArray as PA_

data_nums = [1.0, 2.0, 3.0]
data_chars = ['a', 'b', 'c']

pd_df = pd.DataFrame({'nums': data_nums,
                      'chars': data_chars})

pa_df = pd.DataFrame(pd.concat([pd.Series(PA_(data_nums, 'm'), name='nums'),
                                pd.Series(data_chars, name='chars')], axis=1))

print(f"pd_df =\n{pd_df}")
print(f"pd_df.dtypes =\n{pd_df.dtypes}")
print(f"pa_df =\n{pa_df}")
print(f"pa_df.dtypes =\n{pa_df.dtypes}")

xx = pd_df.nums.map(lambda x: x+1)
print(f"xx = pd_df.nums.map(lambda x: x+1)\n{xx}")
print(f"xx.values = {xx.values}")

print(pa_df.nums.pint.m)
yy = pa_df.nums.map(lambda x: Q_(x.m+1, x.u))
print(f"yy = pa_df.nums.map(lambda x: x+1m) =\n{yy}")
print(f"yy.astype('pint[m]').pint.m =\n{yy.astype('pint[m]').pint.m}")
print(f"yy.pint.m =\n{yy.pint.m}")

I tried making sense of how categories and sparse things use map, but I'm just not up to speed on those parts of pandas. But I do think that it would be SUPER if, when applying quantity-preserving transformations to Series that contain PintArrays, we could preserve the PintArray and its Quantity dtype.

andrewgsavage commented 1 year ago

this is possible in pandas 2.1. not yet released https://github.com/pandas-dev/pandas/pull/51809

topper-123 commented 12 months ago

I responded to the wrong issue, sorry. just ignore this comment.