xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
113 stars 12 forks source link

[Feature]: greater flexibility in choosing longitude orientation + swap_lon_axis discontinuity #197

Open pochedls opened 2 years ago

pochedls commented 2 years ago

Is your feature request related to a problem?

This is part feature request, part FYI (I wanted to document this somewhere). We can move this to the discussion area if that is a better place.

Some models that have grid cell boundaries that are partially outside a strict (0 to 360) or (-180 to 180) domain. For example, a model with longitude values of 0, 1.875, 3.75, ..., 358.125 has a boundary that crossed the prime meridian. In this case the first longitude value has a boundary of -0.9375 to 0.9375 (or 359.0625 to 0.9375).

The spatial averaging (and CDAT) routine breaks this grid cell into two cells with the same value (but different boundaries): one that spans (0 to 0.9375) and another that spans (359.0625 to 360).

I am just noting here that swap_lon_axis does not do this (at least for cells spanning -180/180). Here I provide an example where the number of longitude grid cells is the same after swapping the orientation (and there is an example of a grid cell with a lower bound of 179.0625 and an upper boundar of -179.0625. See below:

# load dataset
dpath = '/p/css03/esgf_publish/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/'
ds = xcdat.open_mfdataset(d + '*.nc')
print('Original lon bound', ds.lon_bnds.values[0,:], len(ds.lon))
# swap to 180
ds2 = xcdat.swap_lon_axis(ds, to=(-180, 180))
print('Updated lon bound', ds2.lon_bnds.values[0,:], len(ds2.lon))

Original lon bound [-0.9375 0.9375] 192 Updated lon bound [ 179.0625 -179.0625] 192

Note that these grid cell boundaries are correct (but not exactly the same as what CDAT does).

Describe the solution you'd like

In the future, I think it would be useful to give the user more flexibility in specifying longitude domains, e.g., (-90 to 90) or (0, 120) or whatever, instead of only (-180 to 180) or (0 to 360). At that point, it may make sense to ensure the grid cell boundaries are aligned with the user specified domain boundaries (and to repeat grid cell values in the case that the grid cells cross -180/180 or 0/360).

For now, I think we can leave this as is until this functionality is needed since I don't think there is anything wrong with the current functionality.

Describe alternatives you've considered

No response

Additional context

This (grid cells crossing the prime meridian) was something that needed to be handled in spatial averaging because the area weights are calculated from the grid cell boundaries. I'm not sure this will cause problems in general, though.

durack1 commented 2 years ago

@pochedls just a comment, for some grids, duplicate values are a problem. I've dropped an example below - 80.22104 , 80. are duped (which equates to two lines of longitude that do not include identical values):

(cdms315) bash-4.2$ ipython
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import cdms2 as cdm

In [2]: f = '/p/user_pub/xclim/CMIP5/CMIP/historical/ocean/mon/thetao/CMIP5.CMIP.historical.IPSL.IPSL-CM5A-LR.r1i1p1.mon.thetao.ocean.glb-l-gu.1.0000000.0.xml'

In [3]: fH = cdm.open(f)

In [4]: var = fH['thetao']

In [5]: lon = var.getLongitude()

In [6]: lon.getValue().data[-1,:]
Out[6]: 
array([ 80.22104 ,  80.      ,  79.77896 ,  79.66438 ,  79.5721  ,
        79.49056 ,  79.41527 ,  79.34387 ,  79.27494 ,  79.20752 ,
        79.1409  ,  79.07452 ,  79.00793 ,  78.94071 ,  78.872505,
        78.80297 ,  78.73175 ,  78.6585  ,  78.58286 ,  78.50445 ,
        78.422844,  78.3376  ,  78.24821 ,  78.15411 ,  78.05466 ,
        77.949104,  77.83662 ,  77.716194,  77.58667 ,  77.44664 ,
        77.29445 ,  77.128075,  76.94504 ,  76.74226 ,  76.51592 ,
        76.26111 ,  75.97146 ,  75.63866 ,  75.25145 ,  74.79438 ,
        74.245544,  73.57288 ,  72.7275  ,  71.63116 ,  70.15037 ,
        68.03828 ,  64.785484,  59.165733,  47.511005,  16.66903 ,
       319.92725 , 290.81793 , 279.70285 , 274.27374 , 271.10553 ,
       269.0369  , 267.5803  , 266.49802 , 265.66074 , 264.99234 ,
       264.44507 , 263.9876  , 263.59845 , 263.26233 , 262.96823 ,
       262.7079  , 262.4749  , 262.2645  , 262.07272 , 261.89645 ,
       261.73312 , 261.5806  , 261.43704 , 261.30084 , 261.17056 ,
       261.04477 , 260.92212 , 260.8011  , 260.6798  , 260.55573 ,
       260.42447 , 260.27606 , 260.071   , 260.00015 , 260.00012 ,
       260.00012 , 260.0001  , 260.00006 , 260.00006 , 260.00003 ,
       260.      , 260.      , 260.      , 259.99997 , 259.99994 ,
       259.99994 , 259.9999  , 259.99988 , 259.99988 , 259.99985 ,
       259.929   , 259.72394 , 259.57553 , 259.44427 , 259.3202  ,
       259.1989  , 259.07788 , 258.95523 , 258.82944 , 258.69916 ,
       258.56296 , 258.4194  , 258.26688 , 258.10355 , 257.92728 ,
       257.7355  , 257.5251  , 257.2921  , 257.03177 , 256.73767 ,
       256.40155 , 256.0124  , 255.55493 , 255.00766 , 254.33926 ,
       253.50198 , 252.41971 , 250.9631  , 248.89447 , 245.72624 ,
       240.29715 , 229.18208 , 200.07274 , 143.33098 , 112.48899 ,
       100.83427 ,  95.214516,  91.96172 ,  89.84963 ,  88.36884 ,
        87.2725  ,  86.42712 ,  85.754456,  85.20562 ,  84.74855 ,
        84.36134 ,  84.02854 ,  83.73889 ,  83.48408 ,  83.25774 ,
        83.05496 ,  82.871925,  82.70555 ,  82.55336 ,  82.41333 ,
        82.283806,  82.16338 ,  82.050896,  81.94534 ,  81.84589 ,
        81.75179 ,  81.6624  ,  81.577156,  81.49555 ,  81.41714 ,
        81.3415  ,  81.26825 ,  81.19703 ,  81.127495,  81.05929 ,
        80.99207 ,  80.92548 ,  80.8591  ,  80.79248 ,  80.72506 ,
        80.65613 ,  80.58473 ,  80.50944 ,  80.4279  ,  80.33562 ,
        80.22104 ,  80.      ], dtype=float32)
pochedls commented 2 years ago

@pochedls just a comment, for some grids, duplicate values are a problem.

@durack1 - I want to make sure that I understand this. Are you noting that some models have data that may not necessarily make sense? Or that CDAT (not xcdat) has an issue somewhere? This may be different from this feature suggestion (how to represent model data that does make sense when the data spans across the prime meridian).

In your example, are the values you are looking at over land (when I printed the grid cell values for lon[-1, 0] and lon[-1, -2]) they were masked and seemingly over land)?

durack1 commented 2 years ago

@pochedls the values that I have encountered before were over the ocean, so "valid" (non-masked) data for the variable. But these provided values were not identical, so two grid cells, representing the same geographical location, but not the same value. This does not appear sensible to me, however, if you deal with this issue (pick a value, removing the duplicate), the model fields do make sense.

This isn't as much an issue with CDAT (or xCDAT), but with the xESMF regridding library that both packages are calling. I figured while you are catching edge cases, may as well raise another that I know about

pochedls commented 2 years ago

@durack1 - thanks for explaining – this is a different issue than this feature request. I think the best thing would be to demonstrate the issue above in xcdat in a feature request with a suggested solution. This seems like it could be a challenging and time consuming issue to tackle (automatically trying to detect problematic data and then fixing it).

durack1 commented 2 years ago

@pochedls my motivation (as I presume that in the case you have defined above https://github.com/XCDAT/xcdat/issues/197#issue-1144822931) was that some warning or similar would be thrown alerting a user that software is attempting to fix a well-defined case in your example. As you note, "fixing" the current behaviour with the IPSL-CM5A-LR grid will be user-dependent as it is somewhat arbitrary, but in the case that xESMF just deals with it, as a user I would like to know that an edge case has been identified, rather than the current (CDAT) behaviour which hides the magic from a user

durack1 commented 2 years ago

@pochedls agreed that my example is more appropriate as discussion, so happy for this (my comments) to be moved elsewhere as you noted originally above https://github.com/XCDAT/xcdat/issues/197#issue-1144822931 as it's just clutter unless a path forward/example is provided