hainegroup / oceanspy

A Python package to facilitate ocean model data analysis and visualization.
https://oceanspy.readthedocs.io
MIT License
98 stars 32 forks source link

allow .persist() to be an option in `llc_rearrange` #315

Closed Mikejmnez closed 1 year ago

Mikejmnez commented 1 year ago

0.3.2

Description

From version 0.2 to 0.3+, I rewrote how llc_rearrange can transform data within faces, which is way more efficient than before and allows to make arbitrary vertical sections within the limited compute resources of Sciserver (interactive compute).

In the process of making llc_rearrange more efficient, I also .persist() some array operations, for example here:

if YRange is not None and XRange is not None:
    DSFacet12[_var] = DSFacet12[_var].transpose(*dtr).persist()
else:
    DSFacet12[_var] = DSFacet12[_var].transpose(*dtr)

where _var is a 2, 3 or 4D variable that survives the cutout (either grid, or data field), and dtr is a dictionary with its horizontal dimensions, for example dtr={'x': Xp1, 'y': Y}. DSFacet12 is a dataset that contains data within faces 0-5, if there is data there that survives the cutout.

The above for loop essentially always persists the transformation, unless the entire domain in being transform (in which case XRange and YRange are both None). This slows down the transformation but triggers some of this calculation. There are pros and cons about this, the pro being that once the transformation is done, most of the calculations have taken place and thus any subsequent operations act from that starting place.

The con, again, is that it slows down the transformation.

Proposal

Set .persist() to be an option in llc_rearrange (with an argument True, False that determines whether to persist or not operations). In addition, I think there should be some well documented safeguard for when the transformation if over a large enough domain (bigger than a threshold), we should not be persisting. For example, we should not be persisting in llc_rearrange if XRange and YRange are both None (meaning transform all the dataset). What that threshold is (chunk size?), we need to figure out at some point.

Mikejmnez commented 1 year ago

Copying from PR #319:

Whether the transformation is persisted or not, is controlled via a True or False (it is False by default) argument that is passed to llc_rearrange.py via oceanspy.subsample.cutout. This is

cut_od = od.subsample.cutout(XRange, YRange)

performs a cutout defined by XRange and YRange, where the transformation is NOT persisted.

To persist the transformation before the cutout:

cut_od = od.subsample.cutout(XRange, YRange, persist=True)

The performance of the cutout, and following operations (such as plotting, interpolation or even integration) will likely be impacted by the choice of whether we persist the transformation or not. But the impact will likely depend on the details of the transformation (i.e., is the cutout data within a single face? is it across many faces with different topology? or even on the total size of the transformed data).

Allowing to persist as an option (along with chunk), will allow us to explore the effect on different transformed data.

Mikejmnez commented 1 year ago

closed by PR #319