pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
189 stars 34 forks source link

Include the original vertical coordinate as data variable in the transformed Dataset #302

Closed mgorfer closed 1 year ago

mgorfer commented 1 year ago

I have an input ds on a vertical pressure coordinate:

<xarray.Dataset>
Dimensions:            (time: 12, pressure: 37, latitude_bins: 36,
                        longitude_bins: 72)
Coordinates:
  * pressure           (pressure) int32 1 2 3 5 7 10 ... 900 925 950 975 1000
  * time               (time) datetime64[ns] 2000-01-01 ... 2000-12-01
  * longitude_bins     (longitude_bins) float64 -177.5 -172.5 ... 172.5 177.5
  * latitude_bins      (latitude_bins) float64 -87.5 -82.5 -77.5 ... 82.5 87.5
Data variables:
    specific_humidity  (time, pressure, latitude_bins, longitude_bins) float32 dask.array<chunksize=(12, 37, 36, 72), meta=np.ndarray>
    temperature        (time, pressure, latitude_bins, longitude_bins) float32 dask.array<chunksize=(12, 37, 36, 72), meta=np.ndarray>
    altitude           (time, pressure, latitude_bins, longitude_bins) float32 dask.array<chunksize=(12, 37, 36, 72), meta=np.ndarray>

I want to transform it to altitude, but also keep the pressure variable in my output. It should look like this then:

<xarray.Dataset>
Dimensions:            (time: 12, longitude_bins: 72, latitude_bins: 36,
                        altitude: 800)
Coordinates:
  * time               (time) datetime64[ns] 2000-01-01 ... 2000-12-01
  * longitude_bins     (longitude_bins) float64 -177.5 -172.5 ... 172.5 177.5
  * latitude_bins      (latitude_bins) float64 -87.5 -82.5 -77.5 ... 82.5 87.5
  * altitude           (altitude) int64 0 100 200 300 ... 79700 79800 79900
Data variables:
    specific_humidity  (time, latitude_bins, longitude_bins, altitude) float32 dask.array<chunksize=(12, 36, 72, 800), meta=np.ndarray>
    temperature        (time, latitude_bins, longitude_bins, altitude) float32 dask.array<chunksize=(12, 36, 72, 800), meta=np.ndarray>
    pressure           (time, latitude_bins, longitude_bins, altitude) float64 dask.array<chunksize=(12, 36, 72, 800), meta=np.ndarray>

Normally, xESMF does not keep the original vertical coordinate in the output. But I need all the data variables and pressure as data variable on a vertical altitude coordinate in my output.

I accomplished this by using this function:

def xgmc_transformation(
    ds: xr.Dataset,
    orig_coord: str = "pressure",
    target_coord: str = "altitude",
    target_resolution: np.ndarray = TARGET_RESOLUTION,
    manual_method: str = "auto",
) -> xr.Dataset:
    """
    Apply a coordinate transformation to the dataset.

    Parameters
    ----------
    ds : xr.Dataset
        The input dataset.
    orig_coord : str, optional
        The original coordinate to transform from. Default is "pressure".
    target_coord : str, optional
        The target coordinate to transform to. Default is "altitude".
    target_resolution :  np.ndarray, optional
        The target resolution to transform to. Default is np.arange(0, 80000, 100).
    manual_method : str, optional (WIP!)
        The chosen manual method for the transformation. Either lin or log.
        Default is auto. If the original vertical coordinate is pressure, auto chooses log.
        In this case, pressure wil be prepared for the log/lin interpolation by np.log(pressure).
        It will get antilogged after the transformation. 
        If the original coordinate is altitude, no log is applied.

    Returns
    -------
    xr.Dataset
        The transformed dataset.
    """
    if manual_method != "auto":
        method = manual_method
    elif orig_coord == "altitude":
        method = "linear"
    elif orig_coord == "pressure":
        method = "log"
    else:
        msg = "Method could not be determined"
        raise ValueError(msg)

    if method == "log":
        ds[orig_coord] = np.log(ds[orig_coord])

    logger.info(
        "Tranform Dataset from %s to %s using %s transform",
        orig_coord,
        target_coord,
        method,
    )

    # Add orig_coord along target_coord in the dataset, as it should be in the output
    ds[f"{orig_coord}_var"] = ds[orig_coord].broadcast_like(ds[target_coord])

    grid = Grid(ds, coords={"Z": {"center": orig_coord}}, periodic=False)

    transform_vars = [var_ for var_ in ds.data_vars if var_ != target_coord]

    transform_list = []
    for var_ in transform_vars:
        var_transformed = grid.transform(
            ds[var_],
            "Z",
            target_resolution,
            target_data=ds[target_coord],
        )
        transform_list.append(var_transformed)
    out = xr.merge(transform_list)

    # Rename orig_coord to the original name.
    out = out.rename({f"{orig_coord}_var": orig_coord})

    if method == "log":
        out[orig_coord] = np.exp(out[orig_coord])

    return out

I am not sure if this is an especially elegant approach, especially the part where I do ds[f"{orig_coord}_var"] = ds[orig_coord].broadcast_like(ds[target_coord]), to insert pressure again into the ds as data variable along altitude.

Is what I want to accomplish maybe already implemented in some way in xESMF?

mgorfer commented 1 year ago

I am sorry for my wrong post.

This should have been in the xgcm repository: https://github.com/xgcm/xgcm/issues/617

Please delete my issue, if possible.