noaa-ocs-modeling / EnsemblePerturbation

perturbation of coupled model input over a space of input variables
https://ensembleperturbation.readthedocs.io
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

Unit handling behavior change #105

Closed SorooshMani-NOAA closed 2 months ago

SorooshMani-NOAA commented 1 year ago

@WPringle it seems that the package handling the units in the perturber (pint-pandas) section of the code has slightly different behavior than before, resulting in some of the tests to fail. This lead me to delve into the code to find the source of the issue. I noticed that for historical errors we use us_statute_mile, but for the radius of maximum winds from the track we us nautical_miles.

First I wanted to make sure that this is the intended units for each of them, then I probably need to find why the unit conversion doesn't work like before!

WPringle commented 1 year ago

@SorooshMani-NOAA That statute versus nautical miles was a quirk from the way historical errors were defined by NHC. Nautical miles are always used in the NHC track data so that is the correct final units. However, the table of historical errors were in statute miles. We could of course just convert those by hand without doing it internally. But anyway, it is correct..

SorooshMani-NOAA commented 1 year ago

Thanks for the information @WPringle.

In any case the main issue is that the Series.pint.ito(...) method doesn't store the converted values back in the Series object. I stepped into the function and it does actually convert it, but for some reason doesn't store. I need to check and see if they added additional args or anything for the value to be propagated back.

SorooshMani-NOAA commented 1 year ago

When trying this code standalone in the same environment (pint==0.22 & pint-pandas==0.4) as tests, it works fine:

import pandas as pd
from pint_pandas import PintType
import pint

units = pint.UnitRegistry()
PintType.ureg = units

df = pd.DataFrame([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]], dtype=PintType('us_statute_mile'))
df[0].pint.ito(units.nautical_mile)
df
                    0    1    2
0  0.8689779798566076  2.0  3.0
1  3.4759119194264305  5.0  6.0
2  6.0828458589962535  8.0  9.0

but the test fails because the dataframe values are not modified in:

https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/6ed2d4f4c70257ef95619dba56ac1bea98234a32/ensembleperturbation/perturbation/atcf.py#L178-L189

Interestingly if I revert to pint==0.19.2 and pint-pandas==0.2 it works fine!

SorooshMani-NOAA commented 2 months ago

Pinning packages doesn't really solve the issue. It's now causing other issues due to newer numpy version when pint is pinned to older versions. I'm reopening this to address properly.