scverse / spatialdata-io

BSD 3-Clause "New" or "Revised" License
32 stars 19 forks source link

reader vs converted and performance #12

Closed LucaMarconato closed 1 year ago

LucaMarconato commented 1 year ago

@giovp we need to test if we can use readers (as the one in your new pr) or if we need for some large datasets to use converters. The reason is that Dask allows to represent lazily the data in both cases, but the operations with a reader could be not performant.

For example, say that you want to read a .ome.tiff file, or rotate an image. With dask-image you can have a Dask array that represents in-memory lazily both of them, construct a SpatialData object and, say, view it in napari. But the visualization will be really poor. Here what would help is to first save the object to .zarr, and then reinitialize the array to read from disk. Now the visualization will be performant.

LucaMarconato commented 1 year ago

Now with https://github.com/scverse/spatialdata/issues/117 when we save to disk the spatialdata object is re-read and the performance problem is addressed. We just have to make aware the user that the data should be saved to have better performance.