roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

Fix the "0 degree meridian" issue in subsetting #35

Closed cehbrecht closed 3 years ago

cehbrecht commented 4 years ago

Description

The core.subset module is missing the implementation for "crossing 0 degree meridian". https://github.com/roocs/clisops/blob/c493bc57ac46591ff9fe0fae77e58eb48e8e31df/clisops/core/subset.py#L114

What I Did

cehbrecht commented 4 years ago

@Zeitsperre why do we need a special implementation for this? I would suppose it works on xarray ...

Zeitsperre commented 4 years ago

From our experience, it doesn't work right out of the box. You could try implementing a cleaner version without the handling but I think your results will be cut at the median line or the dateline, depending on the GCS of the netCDF data.

agstephens commented 4 years ago

We might be able to use the xarray method roll. This allows you view the coordinate space from a different frame of reference:

http://xarray.pydata.org/en/stable/generated/xarray.DataArray.roll.html

Hence you could:

  1. Detect it longitude was 0 to 359 in the input dataset
  2. Roll it to -180 to 180 if required
  3. Then run the subset
Zeitsperre commented 4 years ago

If I recall correctly, the reason why we didn't want to use the "roll" approach was because of the high processing load needed when processing large (>250GB) NetCDF files. It made much more sense to slice up the vector file to better suit the shape of the NetCDF than to change all the NetCDF data with roll.

ellesmith88 commented 3 years ago

@agstephens I've been implementing this but haven't opened a PR yet as I've spotted a few issues with what I've done and I'm looking into them. Here are the changes I've made https://github.com/roocs/clisops/compare/cross_prime_meridian_fix

The issues are:

  1. https://github.com/roocs/clisops/blob/3c810ba98dbfb87fc0a6869de7810ece068802dc/clisops/utils/dataset_utils.py#L6 I'm not sure how well this will work with the test data as they have only a few longitudes e.g. 0, 250

  2. https://github.com/roocs/clisops/blob/3c810ba98dbfb87fc0a6869de7810ece068802dc/clisops/utils/dataset_utils.py#L27-L29 Here I am checking whether the request is within the bounds of the dataset - however it is very strict and if I ask for a subset (0, 100) but the minimum longitude is 0.01 this tries to roll the dataset. The same issue might arise in the case of subsetting again after a first subset in a workflow.

  3. https://github.com/roocs/clisops/blob/3c810ba98dbfb87fc0a6869de7810ece068802dc/clisops/utils/dataset_utils.py#L10 I'm calculating how to roll to end up with longitude -180 to 180, so I think I need to change this to be more generic.

ellesmith88 commented 3 years ago

Here's what I've come up with:

  1. If there's only one longitude - return the dataset as it is and don't try to roll. Don't think this solves the test issue but does look at the possible case of one longitude.

  2. Change the check to something like if (lon_min <= low or np.isclose(low, lon_min, atol=0.5)) and (lon_max >= high or np.isclose(high, lon_max, atol=0.5)):

  3. Instead of rolling the first longitude value to -180, roll it to the lower bound of the requested subset longitude bounds e.g.

    
    low, high = lon_bnds

first_element_value = low diff, offset = calculate_offset(lon, first_element_value) ds_roll = ds.roll(shifts={f"{lon.name}": offset}, roll_coords=False) ds_roll.coords[lon.name] = ds_roll.coords[lon.name] + diff return ds_roll


with

def calculate_offset(lon, first_element_value):

# get resolution of data
res = lon.values[1] - lon.values[0]

# calculate how many degrees to move by to have lon[0] of rolled subset as lower bound of request
diff = first_element_value - lon.values[0]

# work out how many elements to roll by to roll data by 1 degree
index = 1 / res

# calculate the corresponding offset needed to change data by diff
offset = int(diff * index)

return diff, offset
agstephens commented 3 years ago

@ellesmith88: I've read through the code and it's looking great!

Here are my responses to your three points:

  1. If there is only one longitude: it's a special case, which could probably be solved by:

    • if actual_lon is not in lon_bnds:
      • if lon_bnds[0] > actual_lon: new_lon = actual_lon + 360
      • if lon_bnds[1] < actual_lon: new_lon = actual_lon - 360
      • then reassign the lon coord([new_lon])
      • NOTE: no need to roll here because data array remains the same
  2. Change the check to something like if (lon_min <= low or np.isclose(low, lon_min, atol=0.5)) and (lon_max >= high or np.isclose(high, lon_max, atol=0.5)):

    • looks like a great plan
  3. Instead of rolling the first longitude value to -180, roll it to the lower bound of the requested subset longitude bounds:

    • yes, I think that is the best approach, it should deal with all cases.

Great stuff.