jbusecke / xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.
https://cmip6-preprocessing.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
196 stars 44 forks source link

`replace_x_y_nominal_lat_lon` does not work for > 360 `lon` coordinates #295

Closed JoranAngevaare closed 2 months ago

JoranAngevaare commented 1 year ago

Thanks a lot for this great tool that comes in super handy for harmonizing CMIP data

The bug

When using data with more than 360 lon coordinates, the replace_x_y_nominal_lat_lon will not work as a 360 periodicity is assumed in _interp_nominal_lon. I noticed this behavior in EC-Earth-Consortium EC-Earth3 ssp585 r1i1p1f1 Amon tas gr v20200310, which has this subdegree gridding (lon being 512 datapoints long).

from xmip.preprocessing import promote_empty_dims, replace_x_y_nominal_lat_lon, rename_cmip6, correct_coordinates, broadcast_lonlat

ds = xr.open_mfdataset( ... )

def recast(data_set):
    ds = rename_cmip6(data_set)
    ds = promote_empty_dims(ds)
    ds = broadcast_lonlat(ds)
    ds = replace_x_y_nominal_lat_lon(ds)
    return ds

ds.isel(time=0)['tas'].plot()
plt.show()
recast(ds).isel(time=0)['tas'].plot()
plt.show()

image image

Expected behavior

Don't suffle the data when interpolating with sub-degree longitude gridding. This can be solved by not assuming a 360 periodicity.

Underlying cause:

numpy.interp takes the period argument, but this is in the x-coordinate, however, in _interp_nominal_lon, the period is set to 360, this causes funky behavior in this function when you feed it an array with >360 entries, as they are sorted by their arg index.

MWE:

Maybe a small example might be helpful here:

# Create some dummy data
import numpy as np
import xarray as xr
from xmip.preprocessing import promote_empty_dims, replace_x_y_nominal_lat_lon, rename_cmip6, correct_coordinates, broadcast_lonlat
import matplotlib.pyplot as plt

lon=np.linspace(0,360, 513)[:-1]
lat=np.linspace(-90,90, 181)[:-1]
time=np.arange(1)
# Totally arbitrary data
data=np.arange(len(lat)*len(lon)*len(time)).reshape(len(time), len(lat), len(lon))*lon

# Add some NaN values just as an example
data[:, :, len(lon)//2+30: len(lon)//2+50] = np.nan

ds_dummy = xr.Dataset(
    data_vars = dict(var=(('time','lat', 'lon'), data,)),
    coords = dict(time=time, lon=lon, lat=lat,),
    attrs=dict(source_id='bla'))
ds_dummy['var'].isel(time=0).plot(cbar_kwargs=dict(orientation='horizontal'))

image

#  Now apply the recast using replace_x_y_nominal_lat_lon
def recast(data_set):
    ds = rename_cmip6(data_set)
    ds = promote_empty_dims(ds)
    ds = broadcast_lonlat(ds)
    ds = replace_x_y_nominal_lat_lon(ds)
    return ds

ds_dum_recast = recast(ds_dummy)
replace_x_y_nominal_lat_lon(ds_dum_recast)['var'].isel(time=0).plot(cbar_kwargs=dict(orientation='horizontal'),)

image

This second imagine is clearly not what we'd hoped for.

Versions

import sys
import numpy, xarray, xmip
message = f'{sys.platform}\n{sys.version}'
for b in 'numpy xarray xmip'.split():
    message += f'\n\t{b}=={locals().get(b).__version__}'
print(message)

linux
3.8.16 (default, Mar  2 2023, 03:21:46) 
[GCC 11.2.0]
    numpy==1.24.3
    xarray==2023.1.0
    xmip==0.7.1

Related PRs/Issues

I believe https://github.com/jbusecke/xMIP/issues/93 is also related to this and https://github.com/jbusecke/xMIP/pull/79 might be where the bug was introduced.

One way this might be solved is by replacing the above mentioned line as is done in this PR: https://github.com/jbusecke/xMIP/pull/296

bguillod commented 8 months ago

I am also encountering some of the issues described here (and similar once related to coordinates). For some models the longitude fixing seems to screw the data, as also experienced by @Quentin-Bourgeois. Is a fix for this issue in the pipeline @jbusecke?

jbusecke commented 7 months ago

Hi @JoranAngevaare and @bguillod. Very sorry for the long radio silence here. Trying to catch up with a lot of maintenance after working on the new Pangeo/ESGF CMIP6 Zarr data ingestion.

First of all, thanks a lot for the carefully crafted report here @JoranAngevaare. And even more for putting in a fix for this. Ill reopen the PR and continue discussion over there?