speeding up llc_rearrange + LLC4320

Mikejmnez commented 1 year ago

OceanSpy version: newest
Description

After #315, I realize that there are additional ways to further speed up oceanspy, for example when running LLC4320 tutorial notebook.

After PR #300, the U and V velocities are now geographically correct, meaning these have been corrected by the angles CS and SN. This is necessary when making vertical sections in regions with strong grid-deformation (high latitudes) and when needing to calculate transport across sections. However, if one is only interested in making a (global) surface plot of velocity, perhaps is not ideal to correct the velocity since it will trigger every grid point to be multiplied by mostly ones and zeros (also, CS and SN need to also be interpolated into U and V points). Thus, this (geographic) rotation to correct U and V velocities should also be an option that is False by default, but always True when using ospy.subsample.mooring or ospy.subsample.survey

Mikejmnez commented 1 year ago

Just to add to this thread. I transformed the whole surface fields on LLC4320 two different ways: using only llc_rearrange.py directly, and using ospy.subsample.cutout, both With persist= False, (the default).

Using llc_rearrange.py directly, the transformation took only 4-6 minutes (6 when transforming the entire 100Tbs). And is quite straightforward to make surface plots from there.

Using ospy.subsample.cutout took about a over an hour!

Something is slowing down greatly the transformation in subsample.cutout after calling llc_rearrange.py. I will spend some time making some tests but I have some idea of what is going on:

slowdown:

The slowdown on the transformation is likely due to two characteristics of LLC4320:

1) the massiveness of the dataset (TBs), since I didn't see any significant slowdown with ECCO.

2) Another cause of the slowdown has likely to do with the fact that in the LLC4320, only [XC, XG] and [YC, YG] are given and so oceanspy needs to calculate coordinates at velocity points [XU, YU] and [XV, YV]. This requires interpolation of the transformed grid variables. On a massive set like LLC4320, and when transforming over a wide range of lats and lons that crosses several facets with different topologies, this is probably the culprit for the slowdown.

A workaround this:

A good candidate for speeding up ospy.subsample.cutout is to persist the transformation of the (horizontal) grid coordinates during the transformation. These are only 2D, and while persisting the transformation may possible slowdown (only by a little!) llc_rearrange.py, it will speed up any operation that happens after the transformation involving these grid variables.

ThomasHaine commented 1 year ago

Or pre-compute the [XU, YU] and [XV, YV] arrays once and for all?

Mikejmnez commented 1 year ago

Yeah, that might also work. I will explore these options today and or tomorrow - I have been having some issues with Scisever today...

Mikejmnez commented 1 year ago

I made the test, and persisting the transformation of ["XG", "YG"] within llc_rearrange did speed up the overall transformation (i.e. cut_od = ospy.subsample.cutput(**args)), so now it takes much less than 1.5 hrs, but slowed down significantly transforming the dataset directly using llc_rearrange (DS = llc_rearrange.transformation.arctic_crown(**args)) from 4 minutes to like 16min.

Hence, I will follow the suggestion of calculating the grid variables [XU, XV] and [YU, YV] and stored them so that these don't need to be calculated each time we use the cutout function.

Mikejmnez commented 1 year ago

PR #325 sort of closes this issue. However, I will leave this issue open until I create (and store) the grid coordinates [XU, YU] and ["XV", "YV"].

Mikejmnez commented 1 year ago

I used the following code to calculate the grid coordinates at U and V points:


od = ospy.open_oceandataset.from_catalog('LLC4320', url)
od._ds = od._ds.drop_vars({'k', 'k_u', 'k_p1', 'k_l'})
co_list = [var for var in od._ds.variables if "time" not in od._ds[var].dims]
od._ds = od._ds.set_coords(co_list)

ds = copy.deepcopy(od._ds)

grid = od.grid
XU = grid.interp(ds['XG'], axis='Y', boundary='fill')
YU = grid.interp(ds['YG'], axis='Y', boundary='fill')

XU_attrs=dict(standard_name="longitude_at_u_location",
                long_name="longitude",
                units="degrees_east",
                coordinate="YU XU")

YU_attrs=dict(standard_name="latitude_at_u_location",
           long_name="latitude",
           units="degrees_north",
           coordinate="YU XU")

where the grid object has all the face topology information, and the grid_coords from the intake catalog. The resulting variables have dimensions Y, Xp1. Similarly for XV and YV. These grid vars are stored within the grid file now and do not need to be calculated every time a cutout is performed (this was the reason for significant slowdown between llc_rearrange and ospy.subsample.cutout).

I will continue to do some more testing on this (oceanspy calculates the grid coordinates UV very differently from this), which may take a couple of days.

hainegroup / oceanspy

speeding up llc_rearrange + LLC4320 #323

Description

slowdown:

A workaround this: