Closed MichaelTiemannOSC closed 1 year ago
I have created a sample notebook that demonstrates the creation of a dataframe with both quanitified and non-quantified columns. In the Quantified cases, some columns are homogeneous in their units, others are heterogeneous.
Heterogeneous units with different dimensions (so can't be converted so there's only one unit in the column) are not supported.
# As of 20220430, the following create the dataframe correctly, but throws UnitsStrippedWarnings
# '2016_production': [Q_(32.993292,'TWh'),Q_(10.2316757,'TWh'),Q_(42199000.0,'Fe_ton'),Q_(34.61322117,'TWh')],
This line does not create a PintArray. The next lines also do not create a PintArray. Use df.dtypes
to confirm this. You can also see this as it shows the units in the cells.
If you can convert from Fe_ton to TWh then you try making a dataframe with company, units as the columns, and year production as the rows. Convert the dataframe to TWh then transpose and append to the rest of the data.
And yes it would be nice if the above properly showed the units stashed in the homogeneous columns
That's a pandas issue. df.pint.dequantify() is a workaround.
I definitely understand that PintArrays need to be homogeneous in their datatypes, which "units of production" are not (because some units are TWh and some are Fe_ton, and yet others are yet other things not contained in this example). I see now that pint.dequantify() can be useful for the purposes of printing a dataframe with unit information per cell. When my Jupyter notebook server comes back from the dead, I'll give that a try.
I think UnitStrippedWarning should be gone now - can you confirm if this is still an issue?
I have created a sample notebook that demonstrates the creation of a dataframe with both quanitified and non-quantified columns. In the Quantified cases, some columns are homogeneous in their units, others are heterogeneous. I want to write these dataframes down to a Trino database and then read them back in, and I now have functions to do all that. What I don't have is a good understanding of whether or how to tame the warning messages that say:
Here is the notebook in question: https://github.com/os-climate/data-platform-demo/blob/master/notebooks/pint-demo.ipynb
Here's an annotated explanation of one of my frustrations:
And when I try to execute
sample_df.sort_values(by='company_name')
I get this:before I get the rendered:
And yes it would be nice if the above properly showed the units stashed in the homogeneous columns.