xarray-contrib / cf-xarray

an accessor for xarray objects that interprets CF attributes
https://cf-xarray.readthedocs.io/
Apache License 2.0
152 stars 39 forks source link

flag_masks formatting for larger bit depth #491

Closed mps01060 closed 5 months ago

mps01060 commented 5 months ago

For our project we have ~20 independent flag_masks that can simultaneously be triggered (QA/QC flags). The current implementation of "flag_masks" works well with operations such as "==" and "isin" for integers larger than 8-bit:

import xarray as xr
import numpy as np
import cf_xarray

# Create example 32-bit flag_masks
flag_indep_uint32 = xr.DataArray(
    2**np.arange(32, dtype=np.uint32),
    dims=("time",),
    attrs={
        "flag_masks": [2**i for i in range(32)],
        "flag_meanings": " ".join([f"flag_{i}" for i in range(32)]),
        "standard_name": "flag_independent",
    },
    name="flag_var",
)

Find where flag_0 is True

flag_indep_uint32.cf == "flag_0"
<xarray.DataArray 'flag_var' (time: 32)>
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])
Dimensions without coordinates: time

Find where flag_0 or flag_1 or flag_31 are True

flag_indep_uint32.cf.isin(["flag_0", "flag_1", "flag_31"])
<xarray.DataArray 'flag_var' (time: 32)>
array([ True,  True, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True])
Dimensions without coordinates: time

The one part that doesn't seem supported is just using the ".cf" to print the flags/bits:

flag_indep_uint32.cf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.10/site-packages/cf_xarray/accessor.py", line 1543, in __repr__
    return ("".join(self._generate_repr(rich=False))).rstrip()
  File ".../lib/python3.10/site-packages/cf_xarray/accessor.py", line 1561, in _generate_repr
    _format_flags(self, rich), title="Flag Variable", rich=rich
  File ".../lib/python3.10/site-packages/cf_xarray/formatting.py", line 200, in _format_flags
    bitstring[abs(b)] = _format_cf_name("1" if b >= 0 else "0", rich)
IndexError: list assignment index out of range

The _format_flags and find_set_bits have some uint8-dependent code. Internally, we made some small changes here:

https://github.com/mps01060/cf-xarray/tree/morebits

This hasn't been thoroughly tested and there is probably a much more concise way of adding this support compared to what I have. For the same uint32 example earlier, the updated code displays as

flag_indep_uint32.cf
Flag Variable:
       Flag Meanings:    flag_0:               1  / Bit: ...............................1
                         flag_1:               2  / Bit: ..............................1.
                         flag_2:               4  / Bit: .............................1..
                         flag_3:               8  / Bit: ............................1...
                         flag_4:              16  / Bit: ...........................1....
                         flag_5:              32  / Bit: ..........................1.....
                         flag_6:              64  / Bit: .........................1......
                         flag_7:             128  / Bit: ........................1.......
                         flag_8:             256  / Bit: .......................1........
                         flag_9:             512  / Bit: ......................1.........
                        flag_10:            1024  / Bit: .....................1..........
                        flag_11:            2048  / Bit: ....................1...........
                        flag_12:            4096  / Bit: ...................1............
                        flag_13:            8192  / Bit: ..................1.............
                        flag_14:           16384  / Bit: .................1..............
                        flag_15:           32768  / Bit: ................1...............
                        flag_16:           65536  / Bit: ...............1................
                        flag_17:          131072  / Bit: ..............1.................
                        flag_18:          262144  / Bit: .............1..................
                        flag_19:          524288  / Bit: ............1...................
                        flag_20:         1048576  / Bit: ...........1....................
                        flag_21:         2097152  / Bit: ..........1.....................
                        flag_22:         4194304  / Bit: .........1......................
                        flag_23:         8388608  / Bit: ........1.......................
                        flag_24:        16777216  / Bit: .......1........................
                        flag_25:        33554432  / Bit: ......1.........................
                        flag_26:        67108864  / Bit: .....1..........................
                        flag_27:       134217728  / Bit: ....1...........................
                        flag_28:       268435456  / Bit: ...1............................
                        flag_29:       536870912  / Bit: ..1.............................
                        flag_30:      1073741824  / Bit: .1..............................
                        flag_31:      2147483648  / Bit: 1...............................

Coordinates:
             CF Axes:   X, Y, Z, T: n/a

      CF Coordinates:   longitude, latitude, vertical, time: n/a

       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   n/a

       Grid Mappings:   n/a

Thank you for any help / suggestions!

dcherian commented 5 months ago

very nice. I will happily merge this improvement. Can you send in a PR with a test please?