Closed RoyLarson closed 2 years ago
Not at the moment.
The quantify/dequantify was created as an easy way of getting to/from a standard dataframe, as there wasn't support for reading/writing to file at the time. This may have changed since.
Yea we can add that. We'd need a flag like "no_unit" or "N/A" to fill in the non-pintarray columns' unit level. Something that survives the roundtrip in/out of csv or other formats, and doesn't imply dimensionless.
I made a small patch in pint_array.py
diff --git a/pint_pandas/pint_array.py b/pint_pandas/pint_array.py
index 35f2fad..0b556be 100644
--- a/pint_pandas/pint_array.py
+++ b/pint_pandas/pint_array.py
@@ -747,12 +747,17 @@ PintArray._add_arithmetic_ops()
PintArray._add_comparison_ops()
register_extension_dtype(PintType)
+# Magic 'unit' flagging columns with no unit support, used in
+# quantify/dequantify
+NO_UNIT="N/U"
+
@register_dataframe_accessor("pint")
class PintDataFrameAccessor(object):
def __init__(self, pandas_obj):
self._obj = pandas_obj
+
def quantify(self, level=-1):
df = self._obj
df_columns = df.columns.to_frame()
@@ -761,7 +766,7 @@ class PintDataFrameAccessor(object):
df_columns = df_columns.drop(columns=unit_col_name)
df_new = DataFrame(
- {i: PintArray(df.values[:, i], unit) for i, unit in enumerate(units.values)}
+ {i: PintArray(df.values[:, i], unit) if unit != NO_UNIT else df.values[:,i] for i, unit in enumerate(units.values)}
)
df_new.columns = df_columns.index.droplevel(unit_col_name)
@@ -778,7 +783,7 @@ class PintDataFrameAccessor(object):
df_columns = df.columns.to_frame()
df_columns["units"] = [
- formatter_func(df[col].values.units) for col in df.columns
+ formatter_func(df[col].values.units) if hasattr( df[col].values, "units") else NO_UNIT for col in df.columns
]
from collections import OrderedDict
Testing (wrapped inside emacs-org jargon :(
#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
import io
import pandas as pd
import pint
import pint_pandas
#+END_SRC
#+RESULTS:
: Python 3.9.1 | packaged by conda-forge | (default, Jan 10 2021, 02:55:42)
: [GCC 9.3.0] on linux
: Type "help", "copyright", "credits" or "license" for more information.
#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df = pd.DataFrame({
"demo": [ "demo{}".format(i) for i in range(4)],
"torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
"angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
#+END_SRC
#+RESULTS:
: /home/jj/anaconda3/envs/pdata/lib/python3.9/site-packages/pint_pandas/pint_array.py:194: RuntimeWarning: pint-pandas does not support magnitudes of <class 'int'>. Converting magnitudes to float.
: warnings.warn(
#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df['power'] = df['torque'] * df['angular_velocity']
print( df.dtypes )
#+END_SRC
#+RESULTS:
: demo object
: torque pint[foot * force_pound]
: angular_velocity pint[revolutions_per_minute]
: power pint[foot * force_pound * revolutions_per_minute]
: dtype: object
#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
strBuffer = df.pint.dequantify().to_csv(None, index=None)
print(strBuffer)
#+END_SRC
#+RESULTS:
: demo,torque,angular_velocity,power
: N/U,foot * force_pound,revolutions_per_minute,foot * force_pound * revolutions_per_minute
: demo0,1.0,1.0,1.0
: demo1,2.0,2.0,4.0
: demo2,2.0,2.0,4.0
: demo3,3.0,3.0,9.0
#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df2 = pd.read_csv(io.StringIO(strBuffer), header=[0,1])
print( df2 )
print( df2.dtypes )
#+END_SRC
#+RESULTS:
#+begin_example
demo torque angular_velocity power
N/U foot * force_pound revolutions_per_minute foot * force_pound * revolutions_per_minute
0 demo0 1.0 1.0 1.0
1 demo1 2.0 2.0 4.0
2 demo2 2.0 2.0 4.0
3 demo3 3.0 3.0 9.0
demo N/U object
torque foot * force_pound float64
angular_velocity revolutions_per_minute float64
power foot * force_pound * revolutions_per_minute float64
dtype: object
#+end_example
#+BEGIN_SRC python :eval no-export :results output :noweb no :session *Python*
df2_ = df2.pint.quantify(level=-1)
print( df2_)
#+END_SRC
#+RESULTS:
: /home/jj/anaconda3/envs/pdata/lib/python3.9/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
: return np.array(qtys, dtype="object", copy=copy)
: demo torque angular_velocity power
: 0 demo0 1.0 1.0 1.0
: 1 demo1 2.0 2.0 4.0
: 2 demo2 2.0 2.0 4.0
: 3 demo3 3.0 3.0 9.0
Would like to + 1. I could also use this functionality, as I frequently use pint data frames with only some columns having units.
I came here looking to make my work with pint_pandas more friendly to people using the pandas.testing harness. Pandas is very self-centric, consenting only to test assertions about Series, DataFrame, and Index. I'd like to use the same basic structures/syntax for PintArray and came here for guidnace.
Is the better way to dequantify and test the dequantified results, or is there a way to test equality (or near-equality) of PintArrays using the logical extension of the Pandas test framework? (I'm not asking to compare Sequences with PintArrays, but rather a PintArray coming back from a program vs. a PintArray constructed within the test suite).
This solves one problem, but creates another. If a column contains units of differing types, the heterogeneity results in the column being marked N/U, which is not correct.
I have a dataframe where I have a couple of columns of PintArray types and a couple of boolean columns.
I need to be able to dequantify the columns so I can save everything to a csv, but the .pint.dequantify()
causes an error.
Is there a way to get pint_pandas to skip the non-pintarray columns?
Thanks