result of dask.compute()is lost when using processes, which is expected behaviour.
Note that using 'processes' instead of 'threads' considerably speeds up vectorizing labels for large masks, because the function we try to parallelize does not release the GIL.
Example:
import os
from pathlib import Path
import pooch
from pooch import Pooch
from spatialdata import read_zarr
BASE_URL = "https://objectstor.vib.be/spatial-hackathon-public/sparrow/public_datasets"
def _get_registry(path: str | Path | None = None) -> Pooch:
return pooch.create(
path=pooch.os_cache("sparrow") if path is None else path,
base_url=BASE_URL,
version="0.0.1",
registry={
"transcriptomics/vizgen/mouse/_sdata_2D.zarr.zip": "e1f36061e97e74ad131eb709ca678658829dc4385a444923ef74835e783d63bc",
},
)
registry=_get_registry( path = None ) # set path if you want to download data to somewhere else
unzip_path = registry.fetch("transcriptomics/vizgen/mouse/_sdata_2D.zarr.zip", processor=pooch.Unzip())
sdata = read_zarr(os.path.commonpath(unzip_path))
sdata.path = None
import dask
from spatialdata import to_polygons
dask.config.set(scheduler="processes")
gdf=to_polygons( sdata[ "segmentation_mask_full" ] )
# finishes in around 3m locally on a mac m2
dask.config.set(scheduler="threads")
gdf=to_polygons( sdata[ "segmentation_mask_full" ] )
# finishes in around 8m locally on a mac m2
"segmentation_mask_full' contains the masks from a merscope experiment, around 300k labels.
I noticed that when configuring dask to use 'processes' instead of 'threads' ,
spatialdata.to_polygons
fails due to this line:https://github.com/scverse/spatialdata/blob/27bb4a7579d8ff7cc8f6dd9b782226cb984ceb20/src/spatialdata/_core/operations/vectorize.py#L212
result of
dask.compute()
is lost when using processes, which is expected behaviour.Note that using 'processes' instead of 'threads' considerably speeds up vectorizing labels for large masks, because the function we try to parallelize does not release the GIL.
Example:
"segmentation_mask_full' contains the masks from a merscope experiment, around 300k labels.