Collapsing and expanding dimensions in sidpy.dataset

pycroscopy / sidpy

Python utilities for storing, processing, and visualizing spectroscopic and imaging data

https://pycroscopy.github.io/sidpy/

MIT License

12 stars 14 forks source link

Collapsing and expanding dimensions in sidpy.dataset #128

Open ramav87 opened 3 years ago

ramav87 commented 3 years ago

We require a function that expands and collapses sidpy.dataset objects. For instance, the need to collapse spatial dimensions and/or spectral dimensions when undertaking matrix or tensor factorization, deep learning, etc. It should allow the user to specify which dimensions over which to do the collapsing either by name, index, or dimension type. The details of this collapse should be stored in the dataset as a __ attribute (so it is hidden from the user).

ramav87 commented 1 year ago

A method for slicing sidpy datasets that returns sidpy datasets is probably the first order of business @saimani5

ramav87 commented 1 year ago

I have added the slicing ability ( it's just getitem()) but this breaks many tests, presumably because the original assumption was that slicing returns a dask or numpy array and this does not. A workaround is to define our own index() function to enable indexing. I will explore it.

ramav87 commented 1 year ago

I have a working branch that enables indexing of sidpy datasets For example: (on rama_dev)

input_spectrum = np.ones([3, 1, 3])
dataset = sid.Dataset.from_array(input_spectrum)
my_dset = new_dataset[0,:]
isinstance(my_dset, sid.Dataset)

I had to change some of the tests for this to work. Some of our code will need to change if we want this, because by default, previously just typing [I,j] would return the value of the array, but now the method calls dask so you must call .compute() to get the values. I think this is worth the tradeoff but am open to suggestions, @gduscher

On that branch, all the tests are passing, for what it's worth.