hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
44 stars 26 forks source link

Add support for saving multiple lazy signals #269

Open ericpre opened 1 month ago

ericpre commented 1 month ago

An example of where it would be useful to be able to save several dask arrays together:

You can still return the header by just setting the key="header" for a second memmap_distributed call. It will add some time onto the saving of the dataset as the entire dataset might get loaded into ram with most of it thrown away.

Really what we should do is add things to a to_store context manager and then call: https://github.com/hyperspy/rosettasciio/blob/31bd677cc4c02c5787ae9b61250bc1431f352cba/rsciio/hspy/_api.py#L111

Only once. That will merge taskgraphs as necessary and might reduce the time for saving certain signals. I've thought about it for things like saving lazy markers of possibly creating a hs.save() function for handling mulitple signals if you wanted to save multiple parts of some anaylsis efficently. This is a fairly abstract/higher level concept so maybe it would be seledomly used.

_Originally posted by @CSSFrancis in https://github.com/hyperspy/rosettasciio/pull/267#discussion_r1624876062_