Closed Mikejmnez closed 1 year ago
Just to add to this thread. I transformed the whole surface fields on LLC4320 two different ways: using only llc_rearrange.py
directly, and using ospy.subsample.cutout
, both With persist= False
, (the default).
llc_rearrange.py
directly, the transformation took only 4-6 minutes (6 when transforming the entire 100Tbs). And is quite straightforward to make surface plots from there.ospy.subsample.cutout
took about a over an hour!Something is slowing down greatly the transformation in subsample.cutout
after calling llc_rearrange.py
. I will spend some time making some tests but I have some idea of what is going on:
The slowdown on the transformation is likely due to two characteristics of LLC4320:
1) the massiveness of the dataset (TBs), since I didn't see any significant slowdown with ECCO.
2) Another cause of the slowdown has likely to do with the fact that in the LLC4320, only [XC, XG]
and [YC, YG]
are given and so oceanspy
needs to calculate coordinates at velocity points [XU, YU]
and [XV, YV]
. This requires interpolation of the transformed grid variables. On a massive set like LLC4320, and when transforming over a wide range of lats and lons that crosses several facets with different topologies, this is probably the culprit for the slowdown.
A good candidate for speeding up ospy.subsample.cutout
is to persist the transformation of the (horizontal) grid coordinates during the transformation. These are only 2D, and while persisting the transformation may possible slowdown (only by a little!) llc_rearrange.py
, it will speed up any operation that happens after the transformation involving these grid variables.
Or pre-compute the [XU, YU]
and [XV, YV]
arrays once and for all?
Yeah, that might also work. I will explore these options today and or tomorrow - I have been having some issues with Scisever today...
I made the test, and persisting the transformation of ["XG", "YG"]
within llc_rearrange
did speed up the overall transformation (i.e. cut_od = ospy.subsample.cutput(**args)
), so now it takes much less than 1.5 hrs, but slowed down significantly transforming the dataset directly using llc_rearrange
(DS = llc_rearrange.transformation.arctic_crown(**args)
) from 4 minutes to like 16min.
Hence, I will follow the suggestion of calculating the grid variables [XU, XV]
and [YU, YV]
and stored them so that these don't need to be calculated each time we use the cutout function.
PR #325 sort of closes this issue. However, I will leave this issue open until I create (and store) the grid coordinates [XU, YU]
and ["XV", "YV"]
.
I used the following code to calculate the grid coordinates at U and V points:
od = ospy.open_oceandataset.from_catalog('LLC4320', url)
od._ds = od._ds.drop_vars({'k', 'k_u', 'k_p1', 'k_l'})
co_list = [var for var in od._ds.variables if "time" not in od._ds[var].dims]
od._ds = od._ds.set_coords(co_list)
ds = copy.deepcopy(od._ds)
grid = od.grid
XU = grid.interp(ds['XG'], axis='Y', boundary='fill')
YU = grid.interp(ds['YG'], axis='Y', boundary='fill')
XU_attrs=dict(standard_name="longitude_at_u_location",
long_name="longitude",
units="degrees_east",
coordinate="YU XU")
YU_attrs=dict(standard_name="latitude_at_u_location",
long_name="latitude",
units="degrees_north",
coordinate="YU XU")
where the grid object has all the face topology information, and the grid_coords
from the intake catalog. The resulting variables have dimensions Y, Xp1
. Similarly for XV
and YV
. These grid vars are stored within the grid file now and do not need to be calculated every time a cutout is performed (this was the reason for significant slowdown between llc_rearrange
and ospy.subsample.cutout
).
I will continue to do some more testing on this (oceanspy
calculates the grid coordinates UV very differently from this), which may take a couple of days.
Description
After #315, I realize that there are additional ways to further speed up oceanspy, for example when running LLC4320 tutorial notebook.
After PR #300, the
U
andV
velocities are now geographically correct, meaning these have been corrected by the anglesCS
andSN
. This is necessary when making vertical sections in regions with strong grid-deformation (high latitudes) and when needing to calculate transport across sections. However, if one is only interested in making a (global) surface plot of velocity, perhaps is not ideal to correct the velocity since it will trigger every grid point to be multiplied by mostly ones and zeros (also,CS
andSN
need to also be interpolated intoU
andV
points). Thus, this (geographic) rotation to correctU
andV
velocities should also be an option that isFalse
by default, but alwaysTrue
when usingospy.subsample.mooring
orospy.subsample.survey