Closed ammedd closed 1 year ago
Hi @ammedd. Welcome to OceanSpy! I think we need to explore the CMEMS data a bit to give advice. How exactly did you get the data?
Hello Emma! Thanks for reaching out and opening an issue. This is a bit of a long response. I apologize hehe. But it is very complete if you have the patience.
I am not very familiar with data from CMEMS but for now, let me provide with some clarifications/example that creates a mooring_array
making use of oceanspy.subsample
method with a synthetic dataset that looks like yours.
First, I must mention that:
oceanspy
was developed to work with and therefore assumes the data is defined on a C-staggered grid. There is plan for generalizing to other grids, but that remains at an early stage of development. But this implies that oceanspy expects the gridded data to be defined on such a grid. That is, it needs "center" (X, Y) and "corner" (Xp1, Yp1) points. In particular:# velocity data
u (time, Z, Y, Xp1 )
v (time, Z, Yp1, X)
# Scalar properties "Temperature"
theta (time, Z, Y, X)
You can see the layout in the diagram for "mooring_array" here: https://oceanspy.readthedocs.io/en/latest/Kogur.html#Mooring-array
Oceanspy uses the convention X
, Y
for logical (index) variables at center points, and Xp1
, Yp1
refer to corner points of a scalar grid cell. For more on grid, see https://xgcm.readthedocs.io/en/latest/grids.html.
oceanspy
defines a mooring_array
, at its most basic, as a collection of points in the domain such that these follow a continuous path along the (lon, lat) plane. Meaning, except at the end points, all points in the resulting array are neighbors (in the model) . oceanspy
computes the array trajectory automatically from a "desired path", consisting of the nearest neighbors to the values provided (therefore retaining the original grid). If only the end points of the desired "array" pathway are provided, the continuous trajectory follows a great circle path.The above features are important and exploited when calculating for example, volume transports across a section in the model (which allows budget closures).
So, to create a minimal example, I first create a synthetic dataset with similar variables as the one from your dataset (`depth, 'vo', 'uo').
# import some packages
import xarray as xr
import matplotlib.pyplot as plt
import oceanspy as ospy
import numpy as np
import pandas as pd
import cartopy.crs as ccrs
from xgcm import Grid
from xgcm.autogenerate import generate_grid_ds
# define grid
lon = np.arange(-72, -57.5, 0.08)
lat = np.arange(-67.33,-52.92, 0.08)
z = -np.hstack([np.arange(1,10, 1), np.arange(10,200, 10), np.arange(200,1020, 20)])
Lon, Lat = np.meshgrid(lon, lat)
time = pd.date_range("2020-11-16", periods=1)
coords = {'time': time, 'Z': z, 'lat':lat, 'lon': lon}
ds = xr.Dataset(coords=coords)
# Now I define some variables similar to those in your CMEMS dataset:
# Note: I am making Depth 2D - spatially varying.
Depth = xr.DataArray(np.sin(Lat)* np.cos(Lon), dims=['lat', 'lon'])
u0 = xr.DataArray(np.random.randn(len(time), len(z), len(lat), len(lon)), dims=('time', "Z", "lat", "lon"), coords={"time": time, "Z": z, "lat": lat, "lon": lon})
v0 = xr.DataArray(np.random.randn(len(time),len(z), len(lat), len(lon)), dims=('time', "Z", "lat", "lon"), coords={"time": time, "Z": z, "lat": lat, "lon": lon})
# include vars in dataset
ds['Depth']=Depth
ds['vo'] = v0
ds['uo'] = u0
The above dataset behaves similarly to your CMEMS dataset. But the dataset is missing some key elements in order to represent data on a staggered grid. You nee these features in order to exploit some of oceanspy functionality (mooring_array
).
The dataset needs to mimic the C-staggered grid with the nomenclature (X
, Xp1
, Y
, Yp1
) for axes. It also needs the coordinate points (lat, lon) to be defined as 2D arrays: XC, YC
for lon, lat
values at center points, and XG, YG
for lon, lat
values at corner points. This is common with mitgcm output.
# coordinate must be 2d
XC = xr.DataArray(Lon, dims=['lat', 'lon'])
YC = xr.DataArray(Lat, dims=['lat', 'lon'])
ds['XC'] = XC
ds['YC'] = YC
ds = ds.set_coords(['XC', 'YC'])
Now in this step I generate a grid that mimics the C-stagggered grid from model output. For that, we need to drop the end points of center variables and rename according to oceanspy's convections.
ds_full = generate_grid_ds(ds, {'X':'lon', 'Y': 'lat', 'Z': 'Z'}).isel(lon=slice(0, -1), lat=slice(0, -1), Z=slice(0, -1))
ds_full = ds_full.rename({'lon': 'X', 'lat': 'Y', 'lon_left': 'Xp1', 'lat_left': 'Yp1', 'Z_left': 'Zp1'})
I also create 2D coordinate arrays XG, YG
at corner points.
lon_g, lat_g = np.meshgrid(ds_full.Xp1.data, ds_full.Yp1.data)
XG = xr.DataArray(lon_g, dims=['Yp1', 'Xp1'])
YG = xr.DataArray(lat_g, dims=['Yp1', 'Xp1'])
ds_full['YG'] = YG
ds_full['XG'] = XG
ds_full = ds_full.set_coords(['XG', 'YG'])
The later two are required.
grid_coords = {
"Y": {"Y": None, "Yp1": 0.5},
"X": {"X": None, "Xp1": 0.5},
"Z": {"Z": None, "Zp1": 0.5},
}
grid_coords = {"add_midp": False, "overwrite": True, "grid_coords": grid_coords}
od = ospy.OceanDataset(ds_full)
od = od.set_grid_coords(**grid_coords)
You can access the grid object, and the dataset as follows:
od._dataset # this allows you to view the dataset variables
od.grid # prints the (xgcm) object, so that you can take derivatives / interpolate / etc
Below are two simple examples of the different behaviors you can get when computing mooring arrays:
# Along great circle path
Xmoor1= [np.min(od._ds.X.values), np.max(od._ds.X.values)]
Ymoor1 = [np.min(od._ds.Y.values), np.max(od._ds.Y.values)]
# more complex array
Xmoor2 = np.hstack([np.linspace(-70, -58, 5), -58*np.ones(np.shape(5)), np.linspace(-70, -58, 5)[::-1] , -70*np.ones(np.shape(5))])
Ymoor2 = np.hstack([-66*np.ones(np.shape(5)), np.linspace(-66, -61, 5), -61*np.ones(np.shape(5)), np.linspace(-66, -61, 5)[::-1]])
od_moor1 = od.subsample.mooring_array(Xmoor=Xmoor1, Ymoor=Ymoor1)
od_moor2 = od.subsample.mooring_array(Xmoor=Xmoor2, Ymoor=Ymoor2)
## visualize a horizontal map
fig = plt.figure(figsize=(6, 4))
ax = od.plot.horizontal_section(varName="Depth")
XC1 = od_moor1.dataset["XC"].squeeze()
YC1 = od_moor1.dataset["YC"].squeeze()
XC2 = od_moor2.dataset["XC"].squeeze()
YC2 = od_moor2.dataset["YC"].squeeze()
ax.plot(XC1, YC1, "m.")
ax.plot(XC2, YC2, "k.");
At this point, the horizontal map (with mooring arrays superimposed) should look like:
So for example, Depth
along the two distinct trajectories:
od_moor1.dataset['Depth'].plot(lw=3)
od_moor2.dataset['Depth'].plot(lw=3)
The depths along the mooring arrays should look something like:
For this example, I used the newest pip install of oceanspy, and the following versions of the packages:
xr.__version__
'2023.5.0'
np.__version__
'1.24.3'
pd.__version__
'2.0.2'
cartopy.__version__
'0.21.1'
xgcm.__version__
'0.8.1'
matplotlib.__version__
'3.7.1'
Wow. Thanks a lot for the timely and extensive explanation! Unfortunately I don't have time to look at it in depth till next week. I'll let you know how I manage :-)
First off. I'm using exactly the same packages you stated and ocean spy 0.3.5.dev17+gcea1845 Very easily created from your Oceanography environment ;-)
Also, I like the map with the mooring arrays superimposed and only now realise it is done in the same way as in the Kogur example. Though there I interpreted they were straight lines between the 3 lat-lon pairs that are given.
I haven't managed the mooring array on my data yet, because I realised I didn't download the 2D depth field that gives the bathymetry and today the servers for downloading new data are slow/not working. To be continued!
I think there is one final issue before being able to exploit the full functionality of OceanSpy... In the tutorial it is mentioned that time also has a mid and centre point, but this is not the case with the CMEMS data yet. Is that as easy to fix as the other axes and could you give an example @Mikejmnez?
ps. in your answer above I think the underscore in od._dataset # this allows you to view the dataset variables
is redundant.
I think there is one final issue before being able to exploit the full functionality of OceanSpy...
At the very least get some mooring arrays, and perhaps some other compute functions :) . You may need the full grid metrics and other variables that are usually available from mitgcm model output, to fully exploit oceanspy ;).
In the tutorial it is mentioned that time also has a mid and centre point, but this is not the case with the CMEMS data yet. Is that as easy to fix as the other axes?
Yes. To define a time mid and outer points with the grid object following the example above, you add to the dictionary "grid_coords" a time
element, and set add_midp=True
:
grid_coords = {
"Y": {"Y": None, "Yp1": 0.5},
"X": {"X": None, "Xp1": 0.5},
"Z": {"Z": None, "Zp1": 0.5},
"time": {'time': -0.5}
}
grid_coords = {"add_midp": True, "overwrite": True, "grid_coords": grid_coords}
od = od.set_grid_coords(**grid_coords)
ps. in your answer above I think the underscore in "od._dataset # this allows you to view the dataset variables" is redundant.
Yes! sorry about that. I was inspired to write a separate notebook similar to the example I wrote for you, but with some sample HyCOM data, and some of those little details made it to my response here. A little bit out of context, but we do have two ways to inspect a dataset (via od.dataset
and via od._ds
), with some slight differences between the two approaches. That's where the underscore came to be he.
First off. I'm using exactly the same packages you stated and ocean spy 0.3.5.dev17+gcea1845 Very easily created from your Oceanography environment ;-)
Great! I try to always state the version of packages whenever coming up with a bug/error/example. We have seen in the past some changes to some packages as they increase versions (e.g. deprecation or change in functionality), and since this issue will remain accessible even once closed, I think it will be useful for future reference to show the version of the packages at the time of coding.
Best, Miguel
Thanks again for your extensive answers. I think stating the (versions of) packages used is a very good idea! That's why I added the OceanSpy version as well.
I've been doing some more attempts on my little dataset, but keep running into some errors and am getting a bit frustrated. I got my hands on some MOi/NEMO data on its original grid so I'll start playing with that one.
I do realise I've seen the difference between od.dataset
and od._ds
mentioned in the tutorial as a way to check if aliases have been set.
And, for future reference and my (and maybe other peoples) understanding, perhaps you could enlighten us on why these error messages appear. I've added U and V to the dataset to be more consistent with OceanSpy and that seems to help a bit.
ds_full["U"] = ds_full["uo"]
ds_full["V"] = ds_full["vo"]
Following the Kogur use case for the mooring array:
ds_Vflux = ospy.compute.mooring_volume_transport(od_moor)
ValueError: These variables are not available and can not be computed: ['XU', 'YU', 'XV', 'YV'].
and for the ship survey:
od_surv = od_surv.compute.integral(varNameList="ort_Vel", axesList=["Z"])
ValueError: These variables are not available and can not be computed: ['HFacC', 'HFacW', 'HFacS'].
So perhaps some advice on what minimal variables are for example recommended to download before attempting to follow the use cases. I realise for example that adding Temperature and Salinity would have given me more options to play with/visualise the data.
I suggest we move this issue to a discussion item. And then we review/revise the documentation to clarify this point for new users. I.e. as part of addressing #241. @Mikejmnez has an example notebook that illustrates what's involved.
Hi there! New to Oceanspy but I'd like to start using it, especially for its mooring_array
I thought I'd start off simple, nothing fancy with dask or SciServer, so I downloaded some sample data from CMEMS. The dataset looks fine, but I have trouble opening it as an OceanDataset.
I don't understand the hints in the tutorial i.e. use Set Methods. Is there a working example somewhere? https://oceanspy.readthedocs.io/en/latest/Tutorial.html#Import-datasets
#open as OceanDataset
print(ds)
od = ospy.OceanDataset(ds)
od