hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 41 forks source link

Multiplication by dimensionless PintArray gives surprising results #164

Closed MichaelTiemannOSC closed 11 months ago

MichaelTiemannOSC commented 1 year ago

I know that PintPandas really only works with PintArrays and not with DataFrames that contain different units on every row. But hear me out. The following shows first the "good" behavior of multiplying DataFrame made from PintArrays by a dimensionless series, followed by things that "work", the last of which throws an error. It is surprising to me that I need to use .pint.m to strip away the dimensionless dimension in order to get Series.mul to work as expected.

import pandas as pd
import pint
from pint import Quantity as Q_
ureg = pint.UnitRegistry()
import pint_pandas
from pint_pandas import PintArray as PA_

s0 = pd.Series(PA_([1, 10, 100], 'dimensionless'))
df1 = pd.DataFrame({'a': PA_([1,2,3], 'm'), 'b': PA_([4,5,6], 'g')})
df2 = pd.DataFrame({'a': pd.Series([Q_(1, 'm'), Q_(2, 'g'), Q_(3, 'l')]), 'b': pd.Series([Q_(4, 'm'), Q_(5, 'g'), Q_(6, 'l')])})

print(df1.apply(lambda col: col.mul(s0)))
#        a      b                                                                                                                                    
# 0    1.0    4.0                                                                                                                                    
# 1   20.0   50.0                                                                                                                                    
# 2  300.0  600.0                                                                                                                                    

print(df2.apply(lambda col: col.mul(s0.pint.m)))
#              a            b                                                                                                                        
# 0    1.0 meter    4.0 meter                                                                                                                        
# 1    20.0 gram    50.0 gram                                                                                                                        
# 2  300.0 liter  600.0 liter                                                                                                                        

print(df2.apply(lambda col: col.combine(s0, lambda x, y: ureg(f"{x} * {y}"))))
#              a            b                                                                                                                        
# 0    1.0 meter    4.0 meter                                                                                                                        
# 1    20.0 gram    50.0 gram                                                                                                                        
# 2  300.0 liter  600.0 liter                                                                                                                        

print(df2.apply(lambda col: col.mul(s0)))
# pint.errors.DimensionalityError: Cannot convert from 'gram' ([mass]) to 'meter' ([length])         
MichaelTiemannOSC commented 1 year ago

I've come across a similar error message in a different context and I suspect the problem is that the multiplication is prematurely inferring the dtype of the operation from the first multiplication (master scalar?) and wrongly broadcasting that to the rest of the multiplication. But I haven't been inside the code, so cannot say for sure.

MichaelTiemannOSC commented 1 year ago

I just discovered this workaround for multiplying a DataFrame (of 'pint[dimensionless]' numbers) by a Series (of heterogeneous unts):

df.pint.dequantify().mul(ser, axis=0).droplevel('unit', axis=1)

andrewgsavage commented 11 months ago

multiplication is prematurely inferring the dtype of the operation from the first multiplication (master scalar?) and wrongly broadcasting that to the rest of the multiplication.

That's the intention - a PintArray can only hold a single unit, so if it cannot convert to that unit it should error.