inducer / pytato

Lazily evaluated arrays in Python
Other
11 stars 16 forks source link

CSR-based sparse arrays? #211

Open xywei opened 3 years ago

xywei commented 3 years ago

@inducer @kaushikcfd As discussed earlier today, the motivation is to extend the syntax to support batching across a sparse array.

For example, given an expression for per-box map (I am using python syntax to represent expression IOs):

def per_box_expr(box_data: Array[dtype, (ndim_in, _n_box_points)]
    ) -> Array[dtype, (ndim_out, _n_box_points)]:
    ...
    return per_box_output

A "CSR batching" here means applying the per-box map to a list of boxes with inhomogeneous point counts, which does the equivalent of the following:

def csr_apply(
    expr: Array[dtype, (ndim_out, _n_box_points)],
    data_lists: Array[dtype, (ndim_in, n_total_points)],
    data_starts: Array[int, (n_boxes + 1, )],
    axis: int=0) -> Array[dtype, (ndim_out, n_total_points)]:

    # this specifies the computation to be carried out, not a valid implementation
    output = Array(dtype, (ndim_out, n_total_points))
    for ib in range(n_boxes):
        start, stop = (data_starts[ib], data_starts[ib+1])
        box_data = data_lists[:, start:stop]
        output[:, start:stop] = per_box_expr(box_data)

   return output

Note that start, stop, _n_box_points needs to be special placeholders in order to handle data-dependency.

@kaushikcfd also suggested a more general interface by supplying CSR arrays similar interface to normal arrays, and add the equivalence of numpy.apply_along_axis support for arrays.

inducer commented 3 years ago

Ultimately, if we were to treat sparse arrays as normal ones, we likely wouldn't need an analog of apply_along_axis, as the expressions could just be evaluated on the sparse entries of the array. At first, only "sparsity-preserving" operations would/should be allowed, i.e. anything that changes the sparsity pattern would have to be disallowed, implicitly or in a verified manner. Then we could look into reductions over sparse axes, which would at least give us SPMV.