xarray-contrib / cf-xarray

an accessor for xarray objects that interprets CF attributes
https://cf-xarray.readthedocs.io/
Apache License 2.0
152 stars 39 forks source link

CF unit formatter incompatible with latest pint #522

Closed aulemahal closed 2 days ago

aulemahal commented 1 week ago

With pint 0.24, the formatting machinery changed a lot. In 0.24.0 our formatter broke because we couldn't re-create Unit object, the registry was None (see hgrecco/pint#2009), this was fixed, but another bugfix broke the specific case of a dimensionless variable, see hgrecco/pint#2024.

MWE with cf-xarray

from cf_xarray import units

u = units.units.parse_units("")
f"{u:cf}"

raises the error shown in the pint issue.

aulemahal commented 1 week ago

I realized our method was a bit complicated and could be made simpler using pint formatting tools instead of regex. However, I haven't found a solution that was backwards-compatible. Here are 3 solutions I found:

Custom formatter class

Idea suggested by @hgrecco

from pint.delegates.formatter.plain import DefaultFormatter
from pint.delegates.formatter._compound_unit_helpers import prepare_compount_unit
from pint import formatter

class CFUnitFormatter(DefaultFormatter):
    def format_unit(self, unit, uspec, sort_func, **babel_kwds):
        numerator, denominator = prepare_compount_unit(
            unit,
            "~D",
            sort_func=sort_func,
            **babel_kwds,
            registry=self._registry,
        )

        out = formatter(
            numerator,
            denominator,
            as_ratio=False,
            product_fmt=" ",
            power_fmt="{}{}",
            parentheses_fmt=r"{}",
        )
        out = out.replace("Δ°", "delta_deg")
        return out.replace("percent", "%")`

# Add to the registry
units.formatter._formatters['cf'] = CFUnitFormatter

Simpler custom function

@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
    num = [(u, e) for u, e in unit.items() if e >= 0]
    den = [(u, e) for u, e in unit.items() if e < 0]
    # in pint < 0.24, the first argument of `formatter` is num + den
    # a test with packaging.version could make this work for all pint versions
    out = formatter(
        num,
        den,
        as_ratio=False,
        product_fmt="{} {}",
        power_fmt="{}{}"
    )

    out = out.replace("Δ°", "delta_deg")
    return out.replace("percent", "%")

The trouble here is that we don't have control on the "shortening" of the units. What was previously f"{u:cf}" must become f"{u:~cf}". And (of course!) f"{u:~cf}" will not work with previous versions of cf_xarray.

Hack and forget

Simply fix the dimensionless issue and don't think about it anymore.

    @pint.register_unit_format("cf")
    def short_formatter(unit, registry, **options):
        import re

        # avoid issue in pint >= 0.24.1
        unit  = unit._units.__class__({k.replace('dimensionless', ''): v for k, v in unit._units.items()})
        # convert UnitContainer back to Unit
        unit = registry.Unit(unit)
        # Print units using abbreviations (millimeter -> mm)
        s = f"{unit:~D}"

        # Search and replace patterns
        pat = r"(?P<inverse>(?:1 )?/ )?(?P<unit>\w+)(?: \*\* (?P<pow>\d))?"

        def repl(m):
            i, u, p = m.groups()
            p = p or (1 if i else "")
            neg = "-" if i else ""

            return f"{u}{neg}{p}"

        out, n = re.subn(pat, repl, s)

        # Remove multiplications
        out = out.replace(" * ", " ")
        # Delta degrees:
        out = out.replace("Δ°", "delta_deg")
        return out.replace("percent", "%")
aulemahal commented 1 week ago

I found something that works in all cases!

@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
    # pint 0.24.1 gives this for dimensionless units
    if unit == {'dimensionless': 1}:
        return ""

    # Shorten the names
    unit = pint.util.UnitsContainer({
        registry._get_symbol(u): exp
        for u, exp in unit.items()
    })

    if Version(pint.__version__) < Version('0.24'):
        args = (unit.items(),)
    else:
        args = (
            ((u, e) for u, e in unit.items() if e >= 0),
            ((u, e) for u, e in unit.items() if e < 0),
        )

    out = pint.formatter(
        *args,
        as_ratio=False,
        product_fmt=" ",
        power_fmt="{}{}"
    )
    out = out.replace("Δ°", "delta_deg")
    return out.replace("percent", "%")

Should gives the same result as before for all versions of pint (except 0.24.0, of course). In pint < 0.24, a dimensionless unit will yield "dimensionless" because in the older pints, the custom formatter isn't even called in that case. In pint 0.24.1, this was modified and the special case is managed at the beginning of this proposition.

I am using packaging.version.Version to ensure backward compatibility, which is not the elegantest but seems fine. Also, I assumed it was reasonable to use the "private" function registry._get_symbol.

dcherian commented 1 week ago

Nice work! Happy to merge this change. Let's also add a pint!=0.24.0 version pin.

Just curious, why out = out.replace("Δ°", "delta_deg")

aulemahal commented 1 week ago

The delta_deg has to do with https://pint.readthedocs.io/en/stable/user/nonmult.html, i.e. when two temperatures are subtracted, the result is not a temperature but rather a "delta". And the short symbol for that is the greek letter.

Both "Δ°C" and "delta_degC" are not recognized by udunits, so my guess is that this is to avoid unicode problems down the line in a netCDF ?

I recently stumbled upon these delta in xclim and I added an automatic translation into the related absolute unit in the specific function. (Δ°C -> K, Δ°F -> °R) But maybe that's a bit too strong for cf-xarray...

dcherian commented 1 week ago

Let's add that as a comment in the code when you send in your PR.