tobac-project / tobac

Tracking and object-based analysis of clouds
BSD 3-Clause "New" or "Revised" License
100 stars 53 forks source link

Cell mask axis order in TINT theme #53

Open deeplycloudy opened 3 years ago

deeplycloudy commented 3 years ago

When running the tracking example of the TINT theme, the cell mask variable in the tracked dataset is reversed from what I expected.

The input gridded NetCDF data look like this, with the usual time, z, y, x ordering, here for nc_grid.reflectivity.

<xarray.DataArray 'reflectivity' (time: 143, z: 31, y: 501, x: 501)>
dask.array<concatenate, shape=(143, 31, 501, 501), dtype=float32, chunksize=(1, 31, 501, 501), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 -2.5e+05 -2.49e+05 -2.48e+05 ... 2.49e+05 2.5e+05
  * y        (y) float64 -2.5e+05 -2.49e+05 -2.48e+05 ... 2.49e+05 2.5e+05
  * z        (z) float64 0.0 500.0 1e+03 1.5e+03 ... 1.4e+04 1.45e+04 1.5e+04
  * time     (time) datetime64[ns] 2017-07-13T08:00:47.333999999 ... 2017-07-...
Attributes:
    long_name:      Reflectivity
    units:          dBZ
    standard_name:  equivalent_reflectivity_factor
    valid_max:      94.5
    valid_min:      -32.0
    coordinates:    elevation azimuth range

Here's the tracks dataset. Note that cell_mask = (time, x, y):

Dimensions:               (cell: 5045, storm: 2, time: 143, x: 501, y: 501)
Coordinates:
  * time                  (time) datetime64[ns] 2017-07-13T08:00:47 ... 2017-...
  * cell                  (cell) object '0' '1' '2' '3' ... '1291' '1211' '1100'
Dimensions without coordinates: storm, x, y
Data variables: (12/13)
    grid_x                (cell) float64 296.0 268.5 289.1 ... 54.33 382.7 374.5
    grid_y                (cell) float64 78.2 82.94 84.4 ... 473.2 474.2 498.2
    longitude             (cell) float64 -94.61 -94.9 -94.68 ... -93.73 -93.81
    latitude              (cell) float64 27.9 27.95 27.95 ... 31.45 31.46 31.68
    area                  (cell) float64 10.0 33.0 40.0 11.0 ... 21.0 45.0 33.0
    vol                   (cell) float64 19.0 128.0 143.0 ... 57.5 181.0 131.5
    ...                    ...
    max_alt               (cell) float64 4.5 7.5 7.5 4.5 ... 7.0 10.5 8.5 9.5
    isolated              (cell) bool False True False True ... True False False
    cell_time             (cell) datetime64[ns] 2017-07-13T08:00:47 ... 2017-...
    cell_id               (cell) object '0' '1' '2' '3' ... '1291' '1211' '1100'
    cell_parent_storm_id  (storm) int64 0 1
    cell_mask             (time, x, y) int64 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
Attributes:
    cf_tree_order:  storm_id cell_id
    tree_id:        28572

Note also that the coordinate data for x and y are not present.

I can achieve my preferred behavior with:

# Copy over coordinate data, and fix swapped cell_mask coordinates.
tracks2=tracks.swap_dims({'x':'y2', 'y':'x2'}).rename_dims({'x2':'x', 'y2':'y'})
tracks2['x']=nc_grid['x']
tracks2['y']=nc_grid['y']
tracks2['z']=nc_grid['z']

I verified the above by naively plotting each dataset with xarray, verifying that zooming with sharex=sharey=True gave a view of the same storm cells.

@rcjackson @zssherman Would you mind if I put together a PR with this change, and if so where in the tracking code would you suggest I make the change?

zssherman commented 3 years ago

Hi @deeplycloudy, a PR would be great! I'm not sure on where to add that yet, ill see if I can find a place for it

deeplycloudy commented 3 years ago

As a heads up, I think I'm going to tuck this in at the end of make_tracks in tracks.py. I discovered a few other coordinate and variable name things to clean up based on my understanding of the cf-tree idea. I'll add those to the same PR, since it's all metadata and coordinate cleanup.