hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
101 stars 32 forks source link

Opening local CMEMS data #372

Closed ammedd closed 1 year ago

ammedd commented 1 year ago

Hi there! New to Oceanspy but I'd like to start using it, especially for its mooring_array

I thought I'd start off simple, nothing fancy with dask or SciServer, so I downloaded some sample data from CMEMS. The dataset looks fine, but I have trouble opening it as an OceanDataset.

I don't understand the hints in the tutorial i.e. use Set Methods. Is there a working example somewhere? https://oceanspy.readthedocs.io/en/latest/Tutorial.html#Import-datasets

#open as OceanDataset print(ds) od = ospy.OceanDataset(ds) od

Dimensions: (depth: 35, latitude: 174, time: 31, longitude: 175) Coordinates: * depth (depth) float32 0.494 1.541 2.646 3.819 ... 643.6 763.3 902.3 * latitude (latitude) float32 -67.33 -67.25 -67.17 ... -53.08 -53.0 -52.92 * time (time) datetime64[ns] 2020-11-16 ... 2023-05-16T12:00:00 * longitude (longitude) float32 -72.0 -71.92 -71.83 ... -57.67 -57.58 -57.5 Data variables: vo (time, depth, latitude, longitude) float32 dask.array uo (time, depth, latitude, longitude) float32 dask.array Attributes: (12[/16](https://file+.vscode-resource.vscode-cdn.net/16)) title: Monthly mean fields for product GLOBAL_ANA... references: http://marine.copernicus.eu/ credit: E.U. Copernicus Marine Service Information... licence: http://marine.copernicus.eu/services-portf... contact: servicedesk.cmems@mercator-ocean.eu producer: CMEMS - Global Monitoring and Forecasting ... ... ... source: MERCATOR GLO12 product_user_manual: http://marine.copernicus.eu/documents/PUM/... quality_information_document: http://marine.copernicus.eu/documents/QUID... _CoordSysBuilder: ucar.nc2.dataset.conv.CF1Convention comment: history: Data extracted from dataset http://localho/... Main attributes: .dataset: More attributes: .parameters: Thanks in advance! Emma
ThomasHaine commented 1 year ago

Hi @ammedd. Welcome to OceanSpy! I think we need to explore the CMEMS data a bit to give advice. How exactly did you get the data?

Mikejmnez commented 1 year ago

Hello Emma! Thanks for reaching out and opening an issue. This is a bit of a long response. I apologize hehe. But it is very complete if you have the patience.

I am not very familiar with data from CMEMS but for now, let me provide with some clarifications/example that creates a mooring_array making use of oceanspy.subsample method with a synthetic dataset that looks like yours.

First, I must mention that:

# velocity data
u (time, Z, Y, Xp1 )
v (time, Z, Yp1, X)

# Scalar properties "Temperature"
theta (time, Z, Y, X)

You can see the layout in the diagram for "mooring_array" here: https://oceanspy.readthedocs.io/en/latest/Kogur.html#Mooring-array

Oceanspy uses the convention X, Y for logical (index) variables at center points, and Xp1, Yp1 refer to corner points of a scalar grid cell. For more on grid, see https://xgcm.readthedocs.io/en/latest/grids.html.

The above features are important and exploited when calculating for example, volume transports across a section in the model (which allows budget closures).

Example

So, to create a minimal example, I first create a synthetic dataset with similar variables as the one from your dataset (`depth, 'vo', 'uo').

# import some packages
import xarray as xr
import matplotlib.pyplot as plt
import oceanspy as ospy
import numpy as np
import pandas as pd
import cartopy.crs as ccrs
from xgcm import Grid
from xgcm.autogenerate import generate_grid_ds

# define grid
lon = np.arange(-72, -57.5, 0.08)
lat = np.arange(-67.33,-52.92, 0.08)
z = -np.hstack([np.arange(1,10, 1), np.arange(10,200, 10), np.arange(200,1020, 20)])

Lon, Lat = np.meshgrid(lon, lat)
time = pd.date_range("2020-11-16", periods=1)

coords = {'time': time, 'Z': z, 'lat':lat, 'lon': lon}
ds = xr.Dataset(coords=coords)

# Now I define some variables similar to those in your CMEMS dataset:
# Note: I am making Depth 2D - spatially varying.

Depth = xr.DataArray(np.sin(Lat)* np.cos(Lon), dims=['lat', 'lon'])
u0 = xr.DataArray(np.random.randn(len(time), len(z), len(lat), len(lon)), dims=('time', "Z", "lat", "lon"), coords={"time": time, "Z": z, "lat": lat, "lon": lon})
v0 = xr.DataArray(np.random.randn(len(time),len(z), len(lat), len(lon)), dims=('time', "Z", "lat", "lon"), coords={"time": time, "Z": z, "lat": lat, "lon": lon})

# include vars in dataset
ds['Depth']=Depth
ds['vo'] = v0
ds['uo'] = u0

This is the starting point.

The above dataset behaves similarly to your CMEMS dataset. But the dataset is missing some key elements in order to represent data on a staggered grid. You nee these features in order to exploit some of oceanspy functionality (mooring_array).

The dataset needs to mimic the C-staggered grid with the nomenclature (X, Xp1, Y, Yp1) for axes. It also needs the coordinate points (lat, lon) to be defined as 2D arrays: XC, YC for lon, lat values at center points, and XG, YG for lon, lat values at corner points. This is common with mitgcm output.

# coordinate must be 2d 
XC = xr.DataArray(Lon, dims=['lat', 'lon'])
YC = xr.DataArray(Lat, dims=['lat', 'lon'])

ds['XC'] = XC
ds['YC'] = YC
ds = ds.set_coords(['XC', 'YC'])

Now in this step I generate a grid that mimics the C-stagggered grid from model output. For that, we need to drop the end points of center variables and rename according to oceanspy's convections.

ds_full = generate_grid_ds(ds, {'X':'lon', 'Y': 'lat', 'Z': 'Z'}).isel(lon=slice(0, -1), lat=slice(0, -1), Z=slice(0, -1))
ds_full = ds_full.rename({'lon': 'X', 'lat': 'Y', 'lon_left': 'Xp1', 'lat_left': 'Yp1', 'Z_left': 'Zp1'})

I also create 2D coordinate arrays XG, YG at corner points.

lon_g, lat_g = np.meshgrid(ds_full.Xp1.data, ds_full.Yp1.data)

XG = xr.DataArray(lon_g, dims=['Yp1', 'Xp1'])
YG = xr.DataArray(lat_g, dims=['Yp1', 'Xp1'])

ds_full['YG'] = YG
ds_full['XG'] = XG

ds_full = ds_full.set_coords(['XG', 'YG'])

Now: set oceanspy parameters.

The later two are required.

grid_coords = {
    "Y": {"Y": None, "Yp1": 0.5},
    "X": {"X": None, "Xp1": 0.5},
    "Z": {"Z": None, "Zp1": 0.5},
}
grid_coords = {"add_midp": False, "overwrite": True, "grid_coords": grid_coords}

od = ospy.OceanDataset(ds_full)
od = od.set_grid_coords(**grid_coords)

That should be it when it comes to preparing the dataset to exploit oceanspy functionality.

You can access the grid object, and the dataset as follows:

od._dataset # this allows you to view the dataset variables

od.grid # prints the (xgcm) object, so that you can take derivatives /  interpolate / etc

Computing mooring_array

Below are two simple examples of the different behaviors you can get when computing mooring arrays:

# Along great circle path
Xmoor1= [np.min(od._ds.X.values), np.max(od._ds.X.values)]
Ymoor1 = [np.min(od._ds.Y.values), np.max(od._ds.Y.values)]
# more complex array
Xmoor2 = np.hstack([np.linspace(-70, -58, 5), -58*np.ones(np.shape(5)), np.linspace(-70, -58, 5)[::-1] , -70*np.ones(np.shape(5))])
Ymoor2 = np.hstack([-66*np.ones(np.shape(5)), np.linspace(-66, -61, 5), -61*np.ones(np.shape(5)), np.linspace(-66, -61, 5)[::-1]])

Compute the two mooring arrays via oceanspy.subsample method

od_moor1 = od.subsample.mooring_array(Xmoor=Xmoor1, Ymoor=Ymoor1)
od_moor2 = od.subsample.mooring_array(Xmoor=Xmoor2, Ymoor=Ymoor2)

## visualize a horizontal map

fig = plt.figure(figsize=(6, 4))
ax = od.plot.horizontal_section(varName="Depth")

XC1 = od_moor1.dataset["XC"].squeeze()
YC1 = od_moor1.dataset["YC"].squeeze()

XC2 = od_moor2.dataset["XC"].squeeze()
YC2 = od_moor2.dataset["YC"].squeeze()
ax.plot(XC1, YC1, "m.")
ax.plot(XC2, YC2, "k.");

At this point, the horizontal map (with mooring arrays superimposed) should look like:

horizontal_map

Plot variables along the section.

So for example, Depth along the two distinct trajectories:

od_moor1.dataset['Depth'].plot(lw=3)
od_moor2.dataset['Depth'].plot(lw=3)

The depths along the mooring arrays should look something like:

Depths

For this example, I used the newest pip install of oceanspy, and the following versions of the packages:

xr.__version__
'2023.5.0'

np.__version__
'1.24.3'

pd.__version__
'2.0.2'

cartopy.__version__
'0.21.1'

xgcm.__version__
'0.8.1'

matplotlib.__version__
'3.7.1'
ammedd commented 1 year ago

Wow. Thanks a lot for the timely and extensive explanation! Unfortunately I don't have time to look at it in depth till next week. I'll let you know how I manage :-)

ammedd commented 1 year ago

First off. I'm using exactly the same packages you stated and ocean spy 0.3.5.dev17+gcea1845 Very easily created from your Oceanography environment ;-)

Also, I like the map with the mooring arrays superimposed and only now realise it is done in the same way as in the Kogur example. Though there I interpreted they were straight lines between the 3 lat-lon pairs that are given.

I haven't managed the mooring array on my data yet, because I realised I didn't download the 2D depth field that gives the bathymetry and today the servers for downloading new data are slow/not working. To be continued!

ammedd commented 1 year ago

I think there is one final issue before being able to exploit the full functionality of OceanSpy... In the tutorial it is mentioned that time also has a mid and centre point, but this is not the case with the CMEMS data yet. Is that as easy to fix as the other axes and could you give an example @Mikejmnez?

ps. in your answer above I think the underscore in od._dataset # this allows you to view the dataset variables is redundant.

Mikejmnez commented 1 year ago

I think there is one final issue before being able to exploit the full functionality of OceanSpy...

At the very least get some mooring arrays, and perhaps some other compute functions :) . You may need the full grid metrics and other variables that are usually available from mitgcm model output, to fully exploit oceanspy ;).

In the tutorial it is mentioned that time also has a mid and centre point, but this is not the case with the CMEMS data yet. Is that as easy to fix as the other axes?

Yes. To define a time mid and outer points with the grid object following the example above, you add to the dictionary "grid_coords" a time element, and set add_midp=True :

grid_coords = {
    "Y": {"Y": None, "Yp1": 0.5},
    "X": {"X": None, "Xp1": 0.5},
    "Z": {"Z": None, "Zp1": 0.5},
    "time": {'time': -0.5}
}
grid_coords = {"add_midp": True, "overwrite": True, "grid_coords": grid_coords}

od = od.set_grid_coords(**grid_coords)

ps. in your answer above I think the underscore in "od._dataset # this allows you to view the dataset variables" is redundant.

Yes! sorry about that. I was inspired to write a separate notebook similar to the example I wrote for you, but with some sample HyCOM data, and some of those little details made it to my response here. A little bit out of context, but we do have two ways to inspect a dataset (via od.dataset and via od._ds), with some slight differences between the two approaches. That's where the underscore came to be he.

First off. I'm using exactly the same packages you stated and ocean spy 0.3.5.dev17+gcea1845 Very easily created from your Oceanography environment ;-)

Great! I try to always state the version of packages whenever coming up with a bug/error/example. We have seen in the past some changes to some packages as they increase versions (e.g. deprecation or change in functionality), and since this issue will remain accessible even once closed, I think it will be useful for future reference to show the version of the packages at the time of coding.

Best, Miguel

ammedd commented 1 year ago

Thanks again for your extensive answers. I think stating the (versions of) packages used is a very good idea! That's why I added the OceanSpy version as well.

I've been doing some more attempts on my little dataset, but keep running into some errors and am getting a bit frustrated. I got my hands on some MOi/NEMO data on its original grid so I'll start playing with that one.

I do realise I've seen the difference between od.dataset and od._ds mentioned in the tutorial as a way to check if aliases have been set.

And, for future reference and my (and maybe other peoples) understanding, perhaps you could enlighten us on why these error messages appear. I've added U and V to the dataset to be more consistent with OceanSpy and that seems to help a bit.

ds_full["U"] = ds_full["uo"]
ds_full["V"] = ds_full["vo"]

Following the Kogur use case for the mooring array: ds_Vflux = ospy.compute.mooring_volume_transport(od_moor) ValueError: These variables are not available and can not be computed: ['XU', 'YU', 'XV', 'YV'].

and for the ship survey: od_surv = od_surv.compute.integral(varNameList="ort_Vel", axesList=["Z"]) ValueError: These variables are not available and can not be computed: ['HFacC', 'HFacW', 'HFacS'].

So perhaps some advice on what minimal variables are for example recommended to download before attempting to follow the use cases. I realise for example that adding Temperature and Salinity would have given me more options to play with/visualise the data.

ThomasHaine commented 1 year ago

I suggest we move this issue to a discussion item. And then we review/revise the documentation to clarify this point for new users. I.e. as part of addressing #241. @Mikejmnez has an example notebook that illustrates what's involved.