nv-legate / cunumeric

An Aspiring Drop-In Replacement for NumPy at Scale
https://docs.nvidia.com/cunumeric/24.06/
Apache License 2.0
610 stars 69 forks source link

Missing NumPy functions needed for SLAC applications #1116

Open syamajala opened 7 months ago

syamajala commented 7 months ago

I'm opening this issue so @manopapad and I can keep track of what needs to be implemented for the different cunumeric SLAC applications.

For psana we need:

@manopapad has a patch that tries to improve single index accesses to arrays although that code will be removed when np.uinque(return_index=True) is implemented. All the kernels for psana are just single GPU and do not need to be distributed.

For HDF5 analysis we need gpu and distributed versions of:

For HDF5 analysis we need the following extensions:

For a custom curve_fit implementation we need:

rohany commented 7 months ago

I have a pending PR for np.diff against cuNumeric that can be dusted off and merged.

syamajala commented 7 months ago

We need scipy.optimize.curve_fit. Under the hood this seems to use minpack. Depending on what options you pass to curve_fit I think it might also need cholesky.

JosephGuman commented 7 months ago

At present it seems like DeferredArray's unary_reduction() implementation doesn't allow reducing over multiple dimensions, which would probably be needed to average over an arbitrary subset of axes. Is this important to address in this issue, or is it not required for SLAC's application?

syamajala commented 7 months ago

We do not need to do average over arbitrary subset of axes. Just the 0th axis is enough.

syamajala commented 6 months ago

We have a need for scipy.curve_fit.

manopapad commented 6 months ago

@syamajala all the functions required for the base HDF5 processing script have been merted

syamajala commented 6 months ago

Ok. Will give them a try early next week.

JosephGuman commented 5 months ago

I might took a look at np.unique(return_index=True) if nobody else is working on it right now.

syamajala commented 4 months ago

We still need to investigate performance issues related to the functions that were implemented in this ticket.

Here is a profile from before the missing functions were implemented: https://legion.stanford.edu/prof-viewer/?url=https://sapling.stanford.edu/~seshu/xpp/legion_prof/

And a profile from after: https://legion.stanford.edu/prof-viewer/?url=https://sapling.stanford.edu/~seshu/xpp/legion_prof.1/

rohany commented 4 months ago

There might be still some missing functions that correspond to the pieces of high python utilization, but i don't really see a performance issue in this profile other than the problem size is too small (especially for the public Python core).

syamajala commented 2 months ago

The following functions are missing:

used by custom curve_fit implementation:

use by SLAC code directly:

qldnfox commented 2 months ago

Also missing the following:

syamajala commented 1 month ago

nansum does not support reducing over multiple dimensions:

   File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_module/math_sum_prod_diff.py", line 951, in nansum
    return a._nansum(
           ^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/array.py", line 3580, in _nansum
    return a._nansum(
           ^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/array.py", line 3580, in _nansum
    return perform_unary_reduction(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/thunk.py", line 233, in perform_unary_reduction
    return perform_unary_reduction(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/thunk.py", line
 233, in perform_unary_reduction
    result._thunk.unary_reduction(
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 148, in wrapper
    result._thunk.unary_reduction(
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 148, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 3192, in unary_reduction
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 3192, in unary_reduction
    raise NotImplementedError(
NotImplementedError: Need support for reducing multiple dimensions
    raise NotImplementedError(
NotImplementedError: Need support for reducing multiple dimensions

Also for some reason nanpercentile is still falling back to numpy in cunumeric 24.06.00. It looks like it was merged above though?