hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
96 stars 32 forks source link

plot arrays across faces without cutout #427

Closed Mikejmnez closed 3 weeks ago

Mikejmnez commented 3 months ago

This issue closes #410 and #409

The following are tasks that need to be completed before merging this PR. Until then, consider this as a work in progress

I will work on completing the rest of the tasks sometime this week, or the weekend at the latest.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 68.36158% with 56 lines in your changes are missing coverage. Please review.

Project coverage is 95.62%. Comparing base (9e566fc) to head (ed9f249). Report is 5 commits behind head on main.

Files Patch % Lines
oceanspy/plot.py 68.36% 36 Missing and 20 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #427 +/- ## ========================================== - Coverage 96.59% 95.62% -0.97% ========================================== Files 9 9 Lines 4987 5163 +176 Branches 1190 1246 +56 ========================================== + Hits 4817 4937 +120 - Misses 97 133 +36 - Partials 73 93 +20 ``` | [Flag](https://app.codecov.io/gh/hainegroup/oceanspy/pull/427/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hainegroup) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/hainegroup/oceanspy/pull/427/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hainegroup) | `95.62% <68.36%> (-0.97%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hainegroup#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mikejmnez commented 3 months ago

Hey Tom. catalog_ECCO.yml it is just a catalog on my local computer that I use for development since it points towards the test data from oceanspy. I think it might be better to run the notebook on sciserver. I will make the proper modifications to do that (3 cells)

Mikejmnez commented 3 months ago

@ThomasHaine I changed a bit the notebook in the hyper link above to run in Sciserver, it also has description on how to install the correct oceanspy version (this branch) in a sciserver environment, and also added improved description of the notebook. There are about 20 examples.

The notebook successfully ran in Sciserver with ECCO. I'd expect the same with LLC4320.

ThomasHaine commented 3 months ago

@Mikejmnez thanks. I'll check...

ThomasHaine commented 3 months ago

With LLC4320 it errors at the first plot with:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File <timed exec>:8

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1474, in _plotMethods.faces_array(self, **kwargs)
   1472 @_functools.wraps(faces_array)
   1473 def faces_array(self, **kwargs):
-> 1474     return faces_array(self._od, **kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1127, in faces_array(od, Ymoor, Xmoor, varName, xoak_index, face2axis, **kwargs)
   1121 if xoak_index not in _xoak.IndexRegistry():
   1122     raise ValueError(
   1123         "`xoak_index` [{}] is not supported."
   1124         "\nAvailable options: {}"
   1125         "".format(xoak_index, _xoak.IndexRegistry())
   1126     )
-> 1127 ds_grid.xoak.set_index(["XC", "YC"], xoak_index)
   1128 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)}
   1129 ds_data = _xr.Dataset(cdata)  # mooring data

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in <listcomp>(.0)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/coordinates.py:289, in DatasetCoordinates.__getitem__(self, key)
    287 def __getitem__(self, key: Hashable) -> DataArray:
    288     if key in self._data.data_vars:
--> 289         raise KeyError(key)
    290     return self._data[key]

KeyError: 'XC'
ThomasHaine commented 2 months ago

@Mikejmnez any ideas about how to fix?

With LLC4320 it errors at the first plot with:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File <timed exec>:8

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1474, in _plotMethods.faces_array(self, **kwargs)
   1472 @_functools.wraps(faces_array)
   1473 def faces_array(self, **kwargs):
-> 1474     return faces_array(self._od, **kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1127, in faces_array(od, Ymoor, Xmoor, varName, xoak_index, face2axis, **kwargs)
   1121 if xoak_index not in _xoak.IndexRegistry():
   1122     raise ValueError(
   1123         "`xoak_index` [{}] is not supported."
   1124         "\nAvailable options: {}"
   1125         "".format(xoak_index, _xoak.IndexRegistry())
   1126     )
-> 1127 ds_grid.xoak.set_index(["XC", "YC"], xoak_index)
   1128 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)}
   1129 ds_data = _xr.Dataset(cdata)  # mooring data

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in <listcomp>(.0)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/coordinates.py:289, in DatasetCoordinates.__getitem__(self, key)
    287 def __getitem__(self, key: Hashable) -> DataArray:
    288     if key in self._data.data_vars:
--> 289         raise KeyError(key)
    290     return self._data[key]

KeyError: 'XC'
Mikejmnez commented 2 months ago

Hey @ThomasHaine! Apologies, I thought I wrote but my message stayed as draft. The issue with LLC4320 is a non-issue really. The Keyerror means that the variable XC (and YC) are not defined as coordinates for the xarray.dataset. This is fixed by setting them as coordinates, for example:

ds = ds.set_coords(['XC', 'YC'])

that does it. Otherwise xoak can't find them.

However, last week as I ran this code with LLC4320 on SciServer (Oceanography (Integrated Viewer)) and found that:

1) it remains pretty slow. Most of the time is on building the tree with xoak. I tried loading the variables XC, YC into memory (since these are 2D) but it didn't work. It has to do with building the tree everytime there is a query for coordinates. I need to work on perhaps reusing the build of the tree (via xoak) so that we don't continuously re-build it, which seems unnecessary. Perhaps once Grendel is ready things will be faster and it will be a non-issue.

2) The plotting function has some bug or something. When I use the same coordinates from the ECCO examples but now on LLC4320, the plots are off, which is weird since both datasets have the same topology.

I need to spend some time on it. I am not too concern about 2), but 1) is a bummer (that is slow with LLC4320). Again, perhaps with Grendel things will run much much faster. Right now I am using the basic Oceanography (Integrated Viewer) volume on Sciserver (so pretty basic).

ThomasHaine commented 2 months ago

Thanks @Mikejmnez. I find the same issues (1) and (2). Grendel is still available, so it may be possible to test now. Otherwise, the data transfer to the new ceph cluster will happen soon (I hope).

Is it possible to build the xoak tree once and store it?

Mikejmnez commented 2 months ago

Is it possible to build the xoak tree once and store it?

Not sure, but if it is that would significantly speed up the plotting in this PR, subsample.mooring_array and subsample.stations!! I need to spend some time exploring these options, and finding the bug on the plotting func in this PR.

Mikejmnez commented 2 months ago

2. The plotting function has some bug or something. When I use the same coordinates from the ECCO examples but now on LLC4320, the plots are off, which is weird since both datasets have the same topology.

Fixed this problem with 2) above. It was minor issue and now plotting works as expected with LLC4320, DYAMOND and all other LLC-geometries. You can find some examples with LLC4320 (and how long it took to run) on this notebook.

note

TODO: Still need to write tests and documentation (docstring).

Mikejmnez commented 2 months ago

This PR is ready for merger!

ThomasHaine commented 2 months ago

Thanks @Mikejmnez . I'll take a look and review, then you can merge.

Mikejmnez commented 2 months ago

Thanks @Mikejmnez . I'll take a look and review, then you can merge.

Great. I updated the example notebook at the top of this PR conversation so you can see examples with LLC4320. You can see the timings there when the grid dataset is stored in zarr in my machine, next to the notebook.

The ds_grid + xoak combination is being build everytime there is a plot, and so those timings can probably be improved a bit. The notebook can be considered the baseline when data is stored locally next to compute environment.

Mikejmnez commented 2 months ago

@ThomasHaine BTW you should re-install oceanspy from the branch, since the code has changed since last time you installed it. The instructions to do that are in the example notebook that is referenced at the beginning of the PR conversation.

NOTE: The notebook uses a yaml file that is also available from the GH repository, but that yaml file defines the dataset locally to the notebook. To run that notebook from within Sciserver, you should instead use oceanspy's default catalog:

cell [4]:  ECCOod = ospy.open_oceandataset.from_catalog("ECCO")

and

cell[6]: LLC4320od = ospy.open_oceandataset.from_catalog("LLC4320")
ThomasHaine commented 1 month ago

@Mikejmnez I checked the notebook with ECCO and LLC4320 on SciServer, and all works fine. Please proceed with the merge!