xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
118 stars 12 forks source link

[Feature]: Explore land sea mask generation #576

Open lee1043 opened 11 months ago

lee1043 commented 11 months ago

Is your feature request related to a problem?

I know that we had discussed this before and concluded to take an independent path to host land sea mask generation capability in a separate package (i.e., pcmdi_utils), I recently came across much simpler and more reliable method that possibly be worth considering to implement to xcdat.

The new approach uses regionmask, a package that is available via conda and pip.

This method has following advantages: (1) It overcomes the complexity of the regrid2-based method that is originated from cdutil. Although the more precised land sea fraction conservation is considered in the regrid2-based method, in common practical case of global uses its influence is not very critical. (2) It is not using global-land-mask, which is only available via pip, thus does not complicate xcdat installation.

Describe the solution you'd like

Function

[Proposed function updated -- boolean option added]

import regionmask
import xarray as xr
import xcdat as xc

def create_land_sea_mask(ds: xr.Dataset, boolean: bool=False) -> xr.DataArray:
    """
    A function generates land sea mask (1: land, 0: sea) for given xarray Dataset,
    assuming the given xarray dataset and has latitude and longitude coordinates. 

    Parameters
    ----------
    ds : xr.Dataset
        A Dataset object.
    boolen : bool, optional
        Set mask value to True (land) or False (sea), by default False

    Returns
    -------
    xr.DataArray
        A DataArray of land sea mask (1: land, 0: sea)
    """
    # Create a land-sea mask using regionmask
    land_mask = regionmask.defined_regions.natural_earth_v5_0_0.land_110

    # Get the longitude and latitude from the xarray dataset
    key_lon = xc.axis.get_dim_keys(ds, axis="X")
    key_lat = xc.axis.get_dim_keys(ds, axis="Y")

    lon = ds[key_lon]
    lat = ds[key_lat]

    # Mask the land-sea mask to match the dataset's coordinates
    land_sea_mask = land_mask.mask(lon, lat)

    if not boolean:
        # Convert the land-sea mask to a boolean mask
        land_sea_mask = xr.where(land_sea_mask, 0, 1)  

    return land_sea_mask

Examples

[Examples updated -- boolean option and regional use case added]

target_grid = xc.create_uniform_grid(-90, 90, 1, 0, 359, 1)
mask = create_land_sea_mask(target_grid)
mask.plot()

output1

target_grid = xc.create_uniform_grid(-90, 90, 0.5, -180, 179, 0.5)
mask = create_land_sea_mask(target_grid)
mask.plot()

output 2png

mask = create_land_sea_mask(target_grid, boolean=True)
mask.plot()

output4

target_grid = xc.create_uniform_grid(20, 45, 0.1, 110, 135, 0.1)
mask = create_land_sea_mask(target_grid, boolean=True)
mask.plot()

output5

Describe alternatives you've considered

No response

Additional context

No response

lee1043 commented 11 months ago

@mzelinka @pochedls @bosup would you be interested in trying this function that I drafted and share your feedback?

pochedls commented 11 months ago

@lee1043 – I like regionmask. I ran into an issue with it that was recently closed. Have you hit any issues with NaNs in experimenting with it?

I'm supportive of including support for land/sea masking in xcdat, though it might need some discussion (e.g., why not just use regionmask or other packages directly without wrapping them).

lee1043 commented 11 months ago

@pochedls glad to learn you already have explored this path! Thank you for sharing the edge case. I haven't hit such issue, maybe the issue you posted is specifically coming with ocean_basins_50. Is there a specific region you are using the ocean_basins_50? Otherwise, I tried a workaround of your edge case using land_110, not sure if that is something you were looking for.

import numpy as np

# create mask
#ocean = regionmask.defined_regions.natural_earth_v5_0_0.ocean_basins_50
ocean = regionmask.defined_regions.natural_earth_v5_0_0.land_110
mask = ocean.mask(ds.lon, ds.lat)
mask = xr.DataArray(np.nan_to_num(mask, nan=True), coords={"lat": ds.lat, "lon": ds.lon}, dims=["lat", "lon"])
mask.plot()

output3

pochedls commented 11 months ago

This issue is likely fixed with ocean_basins_50 (the issue I opened was closed). I was just wondering if you had hit anything similar. Mark also found a library to do land/sea masking, but I don't think the library was available via anaconda.

lee1043 commented 11 months ago

@pochedls I was able to reproduce the edge case issue you had, so I presume the issue has not been fully resolved. And you are right, the library Mark found, global-land-mask, is only available via pip install.

lee1043 commented 5 months ago

Just a note, this capability is implemented to PMP (here), which I'd be happy to copy it to xCDAT if we decide to implement.

tomvothecoder commented 5 months ago

As discussed in our meeting today (06/05/24), we decided that @lee1043 will open a PR with his function from the PMP commented above since this feature seems to be useful for the community and brought up frequently. We will perform additional validation in this new PR, add unit tests, and update documentation.

@xCDAT/core-developers

pochedls commented 5 months ago

@lee1043 - thanks for working on this. One feature that would be nice to include: An (optional?) flag to mask land or ocean (so either land/ocean grid cells are 1 and the other is np.nan). This would be very helpful to include in this function (rather than having the user do .where operations later to do the masking).