hyperspy / hyperspy

Multidimensional data analysis
https://hyperspy.org
GNU General Public License v3.0
512 stars 208 forks source link

Add support for Awkward Arrays #3373

Open CSSFrancis opened 4 months ago

CSSFrancis commented 4 months ago

Describe the functionality you would like to see.

The ragged implementation in hyperspy currently acts as a bit of a bottle neck and this is largely a result of the object implementation in numpy. For one it is slow which slows down vector operations which should be realitively quick! It also doesn't have a good definition for different flavors of ragged arrays. For example:

  1. Same number of dimensions (images with 2 dimensions but different shapes?)
  2. Vectors with the the same number of columns but different numbers of rows
  3. Truely object based arrays which pass a non array-like object.

This makes visualization and axes definition difficult. In case 1 we might want to pass axes with no "size" parameter. In case 2 we might want a size along 1 dimension and not along the row dimension. In case 3 we just want a "ragged" definition.

I think the solutions would be to implement awkward-array

Awkard arrays are fast: https://awkward-array.org/doc/main/getting-started/what-is-an-awkward-array.html#high-performance

Awkward arrays have more defined shapes/structures: https://awkward-array.org/doc/main/getting-started/what-is-an-awkward-array.html#versatile-arrays

Awkward arrays has (some) dask-implementations: This part I don't know if I love. The dask implementation might need some additional work

ericpre commented 4 months ago

I remember that to have come it when you did the work on map and ragged (possibly mentioned in our discussion or from review of the state of the art) and I was unsure at the time if this was worth it using awkward-array in favour of sticky with numpy array. Now that it seems that this is used more widely in the scientific community (not only developed for the high energy physics community) and integrate with the other usual suspect (dask, cupy, numba, etc.)

It may be worth considering this thin wrapper around Awkward-array: the ragged library - see https://github.com/scikit-hep/ragged/discussions/6 for discussion on the difference between the two.

It would need a champion to push for this! 😃