roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

bug with core.subset.subset_shape() when using a shape with non-contiguous polygons #137

Closed tlogan2000 closed 3 years ago

tlogan2000 commented 3 years ago

Description

subset.shape() doesn't work properly for polygons that are not contiguous. Example geojson used in test code available here :https://ouranos-my.sharepoint.com/:u:/g/personal/tralog1_ouranos_ca/EfIY4IARTthNkFkfEL5csu0BmEVLQjPxgP8WpHA0zVOBfg?e=fdBtjj

Notes: When polygon borders are not touching the .dropna() calls in subset_shape() results in chunks (horizontal or vertical) of the dataset being removed. When we examine output lon, lat coords they are not longer monotonously increasing or decreasing. Technically the data has been properly subsetted (from what I can tell at least) but the dataset coordinates become unintuitive as well as causing issues for mapping / plots (pixels stretching)

What I Did

import xarray as xr
from clisops.core import subset
import geopandas as gpd
from dask.diagnostics import ProgressBar
import os
import numpy as np
import matplotlib.pyplot as plt

poly = 'test_subsetshape/test_noncontiguous.geojson'
gdf = gpd.GeoDataFrame.from_file(poly)
url = 'http://pavics.ouranos.ca:8083/twitcher/ows/proxy/thredds/dodsC/datasets/simulations/bias_adjusted/cmip5/ouranos/cb-oura-1.0/day_NorESM1-M_historical+rcp85_r1i1p1_na10kgrid_qm-moving-50bins-detrend_1950-2100.ncml'
ds = xr.open_dataset(url, chunks='auto')
dsSub = subset.subset_shape(ds, shape=gdf)
dsSub.tasmax.isel(time=0).plot()
gdf.plot(figsize=(6,6))
print(np.unique(dsSub.lon.diff(dim='lon')))
print(np.unique(dsSub.lat.diff(dim='lat')))
tlogan2000 commented 3 years ago

This shouldn't be too tough I don't think... a small check and manipulation of output dataset coords should be simple to implement

aulemahal commented 3 years ago

I found this solution, but it's the only one I tried, so there might be a more elegant one:


Xb = mask.bfill('lon').sum('lat') 
Xf = mask.ffill('lon').sum('lat')
Xm = (Xb == Xb.max()) | (Xf == Xf.max())  # 1 in outer zone, 0 in the inner, only for lon

Yb = mask.bfill('lat').sum('lon')
Yf = mask.ffill('lat').sum('lon')
Ym = (Yb == Yb.max()) | (Yf == Yf.max())

final_mask = mask.notnull() | ( ( Ym == 0 ) & ( Xm == 0 ) )  # 1 in the strictly inner zone or on the shapes, 0 elsewhere.

sub_data = data.where(final_mask, drop=True)

Before: image After: image

(And there's some fancy dimension finding missing)