Multi-argument custom reductions

spencerkclark commented 6 years ago

This issue is meant to track discussion that started here, as the implementation discussed in #208 will only address reductions which take a single argument (the raw time series of data loaded/computed for a single aospy.Var object).

Example use cases for multi-argument reductions are:

Vertical integrals (these require the pressure thicknesses to be computed and used as weights when taking a sum over the vertical dimension)
Vertical averages (which require both the pressure thicknesses and the surface pressure)
Regression patterns (as discussed in #208)

Exposing arguments to reduction functions could also be useful as a way to make one's object library more compact. For instance in @chuaxr's percentile example, the numeric percentile could be an argument and would prevent the need for defining separate functions/reduction objects to compute the 90th, 95th, and 99th percentiles.

spencerkclark commented 6 years ago

Quoting from https://github.com/spencerahill/aospy/issues/208#issuecomment-396986973:

Maybe the reduction class could look like this:

class Reduction(object):
    instances = {}
    def __init__(self, func, label, label_append=None):
        """
        Parameters
        ------------
        func : function
            Reduction operation
        label : str
            Label for reduction used in filenames
        label_extend : function
            Function to extend label based on input arguments (optional)
        """
        self.instances.update({label: self})
        self.func = func
        self.label = label
        self.label_append = label_append

    def label(self, *args, **kwargs):
        if self.label_append is None:
            return self.label
        else:
            return '{}-{}'.format(self.label, self.label_append(*args, **kwargs))

We would call label within Calc when constructing the file names. Sticking with the above example, a Regression reduction might look something like this:

def regress(da, index):
    """
    Parameters
    ------------
    da : DataArray
        Variable to regress onto index
    index : DataArray
        1D index in time
    """
    # Compute the regression pattern ...
    return result

def regress_label(da, index):
    return 'onto-{}'.format(index.name)

Regression = Reduction(regress, 'regression', label_append=regress_label)

And maybe in the main script we could allow for the specification of reductions as strings (which is the standard now) for reductions that don't require input arguments, or tuples, e.g. ('regression', [precipitation_index], {}) for ones that do. E.g.

from custom_reductions import Regression
from variables import precipitation_index, ucomp, vcomp

calc_suite_specs = dict(
    ...,
    output_reductions = ['av', ('regression', [precipitation_index], {})],
    variables = [ucomp, vcomp],
    ...
)

Here the implicit assumption is that if an aospy.Var object is passed as an argument to the output reduction, that within the pipeline it will be loaded/computed using the current data loader and load_variable parameters.

spencerahill commented 6 years ago

the numeric percentile could be an argument and would prevent the need for defining separate functions/reduction objects to compute the 90th, 95th, and 99th percentiles.

I see two separate use cases. In one like this, the arguments are simply passed in in the main script and then ultimately passed unchanged to the function.

But in others like the regression example, the arguments are essentially stand-ins for data that then need to be loaded at the appropriate time.

The former obviously is simpler, but it's still more involved than the simplest case being handled in #208, since the string label then becomes dependent on the arguments.

spencerkclark commented 6 years ago

I see two separate use cases. In one like this, the arguments are simply passed in in the main script and then ultimately passed unchanged to the function. But in others like the regression example, the arguments are essentially stand-ins for data that then need to be loaded at the appropriate time.

I think that's splitting hairs to some degree. Once machinery is in place to accept arguments, accepting aospy.Var objects as arguments is not significantly more complicated (see my comment in https://github.com/spencerahill/aospy/issues/208#issuecomment-397132657).

spencerahill / aospy

Multi-argument custom reductions #286