hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
47 stars 28 forks source link

Adding Documentation About Dask-Distributed Support for file types #61

Open CSSFrancis opened 1 year ago

CSSFrancis commented 1 year ago

Describe the functionality you would like to see.

I would like to add to the documentation information about which file loaders support the dask-distributed backend. Mostly just add an extra column here

Currently I believe that this is only the zspy and the new file loader #11 but we can think about adding in support for the hspy file format as well as any of the other binary files.

Describe the context

I have defined a function in #11 that works as a drop in replacement for np.memmap and allows for distributed loading of some data. This is particularly useful for large data sets as well as does a much better job handling the available resources.

Additional information

Using the dask-distributed scheduler is the preferred way to interact with dask in most cases. Supporting distributed schedulers at the loading level is important for larger datasets and allows for much better scalable preformance.

ericpre commented 1 year ago

Yes, sounds good. This table is created manually and it would better to have this information available in the specification and generate table automatically as we do with hs.print_known_signal_types() for example.