hgrecco / pint-pandas

Pandas support for pint
Other
172 stars 42 forks source link

dequantify with dataframes of mixed types #46

Closed RoyLarson closed 2 years ago

RoyLarson commented 4 years ago

I have a dataframe where I have a couple of columns of PintArray types and a couple of boolean columns.
I need to be able to dequantify the columns so I can save everything to a csv, but the .pint.dequantify()
causes an error.

File "..\lib\site-packages\pint_pandas\pint_array.py", line 730, in dequantify
    df_columns["units"] = [
File "..\lib\site-packages\pint_pandas\pint_array.py", line 731, in <listcomp>
    formatter_func(df[col].values.units) for col in df.columns
AttributeError: 'numpy.ndarray' object has no attribute 'units'

Is there a way to get pint_pandas to skip the non-pintarray columns?

Thanks

andrewgsavage commented 4 years ago

Not at the moment.

The quantify/dequantify was created as an easy way of getting to/from a standard dataframe, as there wasn't support for reading/writing to file at the time. This may have changed since.

Yea we can add that. We'd need a flag like "no_unit" or "N/A" to fill in the non-pintarray columns' unit level. Something that survives the roundtrip in/out of csv or other formats, and doesn't imply dimensionless.

jarjuk commented 3 years ago

I made a small patch in pint_array.py

diff --git a/pint_pandas/pint_array.py b/pint_pandas/pint_array.py
index 35f2fad..0b556be 100644
--- a/pint_pandas/pint_array.py
+++ b/pint_pandas/pint_array.py
@@ -747,12 +747,17 @@ PintArray._add_arithmetic_ops()
 PintArray._add_comparison_ops()
 register_extension_dtype(PintType)

+# Magic 'unit' flagging columns with no unit support, used in
+# quantify/dequantify
+NO_UNIT="N/U"
+

 @register_dataframe_accessor("pint")
 class PintDataFrameAccessor(object):
     def __init__(self, pandas_obj):
         self._obj = pandas_obj

+
     def quantify(self, level=-1):
         df = self._obj
         df_columns = df.columns.to_frame()
@@ -761,7 +766,7 @@ class PintDataFrameAccessor(object):
         df_columns = df_columns.drop(columns=unit_col_name)

         df_new = DataFrame(
-            {i: PintArray(df.values[:, i], unit) for i, unit in enumerate(units.values)}
+            {i: PintArray(df.values[:, i], unit) if unit != NO_UNIT else df.values[:,i] for i, unit in enumerate(units.values)}
         )

         df_new.columns = df_columns.index.droplevel(unit_col_name)
@@ -778,7 +783,7 @@ class PintDataFrameAccessor(object):

         df_columns = df.columns.to_frame()
         df_columns["units"] = [
-            formatter_func(df[col].values.units) for col in df.columns
+            formatter_func(df[col].values.units) if hasattr( df[col].values, "units") else NO_UNIT for col in df.columns
         ]
         from collections import OrderedDict

Testing (wrapped inside emacs-org jargon :(

#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
import io
import pandas as pd
import pint
import pint_pandas
#+END_SRC

#+RESULTS:
: Python 3.9.1 | packaged by conda-forge | (default, Jan 10 2021, 02:55:42) 
: [GCC 9.3.0] on linux
: Type "help", "copyright", "credits" or "license" for more information.

#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df = pd.DataFrame({
     "demo": [ "demo{}".format(i) for i in range(4)],
     "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
     "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
#+END_SRC

#+RESULTS:
: /home/jj/anaconda3/envs/pdata/lib/python3.9/site-packages/pint_pandas/pint_array.py:194: RuntimeWarning: pint-pandas does not support magnitudes of <class 'int'>. Converting magnitudes to float.
:   warnings.warn(

#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df['power'] = df['torque'] * df['angular_velocity']
print( df.dtypes )
#+END_SRC

#+RESULTS:
: demo                                                           object
: torque                                       pint[foot * force_pound]
: angular_velocity                         pint[revolutions_per_minute]
: power               pint[foot * force_pound * revolutions_per_minute]
: dtype: object

#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
strBuffer = df.pint.dequantify().to_csv(None, index=None)
print(strBuffer)
#+END_SRC

#+RESULTS:
: demo,torque,angular_velocity,power
: N/U,foot * force_pound,revolutions_per_minute,foot * force_pound * revolutions_per_minute
: demo0,1.0,1.0,1.0
: demo1,2.0,2.0,4.0
: demo2,2.0,2.0,4.0
: demo3,3.0,3.0,9.0

#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df2 = pd.read_csv(io.StringIO(strBuffer), header=[0,1])
print( df2 )
print( df2.dtypes )

#+END_SRC

#+RESULTS:
#+begin_example
demo             torque       angular_velocity                                       power
     N/U foot * force_pound revolutions_per_minute foot * force_pound * revolutions_per_minute
0  demo0                1.0                    1.0                                         1.0
1  demo1                2.0                    2.0                                         4.0
2  demo2                2.0                    2.0                                         4.0
3  demo3                3.0                    3.0                                         9.0
demo              N/U                                             object
torque            foot * force_pound                             float64
angular_velocity  revolutions_per_minute                         float64
power             foot * force_pound * revolutions_per_minute    float64
dtype: object
#+end_example

#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df2_ = df2.pint.quantify(level=-1)
print( df2_)
#+END_SRC

#+RESULTS:
: /home/jj/anaconda3/envs/pdata/lib/python3.9/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
:   return np.array(qtys, dtype="object", copy=copy)
:     demo torque angular_velocity power
: 0  demo0    1.0              1.0   1.0
: 1  demo1    2.0              2.0   4.0
: 2  demo2    2.0              2.0   4.0
: 3  demo3    3.0              3.0   9.0
dcoukos commented 3 years ago

Would like to + 1. I could also use this functionality, as I frequently use pint data frames with only some columns having units.

MichaelTiemannOSC commented 2 years ago

I came here looking to make my work with pint_pandas more friendly to people using the pandas.testing harness. Pandas is very self-centric, consenting only to test assertions about Series, DataFrame, and Index. I'd like to use the same basic structures/syntax for PintArray and came here for guidnace.

Is the better way to dequantify and test the dequantified results, or is there a way to test equality (or near-equality) of PintArrays using the logical extension of the Pandas test framework? (I'm not asking to compare Sequences with PintArrays, but rather a PintArray coming back from a program vs. a PintArray constructed within the test suite).

MichaelTiemannOSC commented 2 years ago

This solves one problem, but creates another. If a column contains units of differing types, the heterogeneity results in the column being marked N/U, which is not correct.