Open syamajala opened 7 months ago
I have a pending PR for np.diff against cuNumeric that can be dusted off and merged.
We need scipy.optimize.curve_fit. Under the hood this seems to use minpack. Depending on what options you pass to curve_fit I think it might also need cholesky.
At present it seems like DeferredArray's unary_reduction() implementation doesn't allow reducing over multiple dimensions, which would probably be needed to average over an arbitrary subset of axes. Is this important to address in this issue, or is it not required for SLAC's application?
We do not need to do average over arbitrary subset of axes. Just the 0th axis is enough.
We have a need for scipy.curve_fit.
@syamajala all the functions required for the base HDF5 processing script have been merted
Ok. Will give them a try early next week.
I might took a look at np.unique(return_index=True) if nobody else is working on it right now.
We still need to investigate performance issues related to the functions that were implemented in this ticket.
Here is a profile from before the missing functions were implemented: https://legion.stanford.edu/prof-viewer/?url=https://sapling.stanford.edu/~seshu/xpp/legion_prof/
And a profile from after: https://legion.stanford.edu/prof-viewer/?url=https://sapling.stanford.edu/~seshu/xpp/legion_prof.1/
There might be still some missing functions that correspond to the pieces of high python utilization, but i don't really see a performance issue in this profile other than the problem size is too small (especially for the public Python core).
The following functions are missing:
used by custom curve_fit
implementation:
use by SLAC code directly:
Also missing the following:
np.fft.ifft
np.rot90
nansum does not support reducing over multiple dimensions:
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_module/math_sum_prod_diff.py", line 951, in nansum
return a._nansum(
^^^^^^^^^^
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/array.py", line 3580, in _nansum
return a._nansum(
^^^^^^^^^^
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/array.py", line 3580, in _nansum
return perform_unary_reduction(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/thunk.py", line 233, in perform_unary_reduction
return perform_unary_reduction(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/thunk.py", line
233, in perform_unary_reduction
result._thunk.unary_reduction(
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 148, in wrapper
result._thunk.unary_reduction(
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 148, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 3192, in unary_reduction
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 3192, in unary_reduction
raise NotImplementedError(
NotImplementedError: Need support for reducing multiple dimensions
raise NotImplementedError(
NotImplementedError: Need support for reducing multiple dimensions
Also for some reason nanpercentile is still falling back to numpy in cunumeric 24.06.00. It looks like it was merged above though?
I'm opening this issue so @manopapad and I can keep track of what needs to be implemented for the different cunumeric SLAC applications.
For psana we need:
@manopapad has a patch that tries to improve single index accesses to arrays although that code will be removed when np.uinque(return_index=True) is implemented. All the kernels for psana are just single GPU and do not need to be distributed.
For HDF5 analysis we need gpu and distributed versions of:
For HDF5 analysis we need the following extensions:
For a custom
curve_fit
implementation we need: