saeyslab / napari-sparrow

https://sparrow-pipeline.readthedocs.io/en/latest/
Other
19 stars 0 forks source link

Bug in allocate_intensity, does segmentation #185

Open berombau opened 7 months ago

berombau commented 7 months ago

@ArneDefauw I have a weird bug where allocate_intensity is very slow. When looking at the Dask dashboard, it seems to not use the segmentation_mask already in the SpatialData object from the segment step before, but some reference still to the segmentation step that created it. So every chunk holds up the line in the Dask graph waiting for cellpose to execute on it again, instead of using the mask.

import sparrow as sp
import spatialdata as sd
from dask.distributed import Client, LocalCluster

cluster = LocalCluster(host="localhost")
client = Client(cluster)

My image, but maybe sd.datasets.blobs will give the same error

SpatialData object with:
└── Images
      └── 'raw_image': SpatialImage[cyx] (43, 830, 830)
with coordinate systems:
▸ 'global', with elements:
        raw_image (Images)
# run in 17 seconds
sp.im.segment(
    sdata,
    img_layer="raw_image",
    diameter=30,
    cellprob_threshold=-4,
    flow_threshold=0.9,
    model_type="nuclei",
    # output_labels_layer="segmentation_mask",
    do_3D=False,
    channels=[1, 0],
    # overwrite=True,
)

# runs for 10 min on 1.000 x 1.000
sdata = sp.tb.allocate_intensity(sdata, img_layer="raw_image", labels_layer="segmentation_mask", chunks=100)

It's hard to debug, I can try to create a better issue next week. It could also be that the backed zarr or some rechunking messes up the graph: sdata.images[el_name].data = sdata.images[el_name].data.rechunk()

ArneDefauw commented 7 months ago

Is your spatialdata object backed by a zarr store? I will try to reproduce it

berombau commented 7 months ago

Yes, it was backed by an on-disk .zarr as far as I can tell.

ArneDefauw commented 7 months ago

I could only reproduce this issue when using the spatialdata from the main branch, and the harpy fork of sparrow https://github.com/saeyslab/harpy. On the main branch of napari sparrow, everything works as expected.

So spatialdata removed the methods sdata.add_image, sdata.add_labels,.., that took care of writing to the zarr storage, and then lazily loading it back after computation of the graph was finished.

Because they removed these utility methods, in the harpy fork we do this to add labels to sdata object:

        sdata.labels[output_layer] = spatial_element
        if sdata.is_backed():
            elem_group = sdata._init_add_element(name=output_layer, element_type="labels", overwrite=overwrite)
            write_labels(
                labels=sdata.labels[output_layer],
                group=elem_group,
                name=output_layer,
            )
            sdata = read_zarr(sdata.path)

Because of this sdata will not be updated inplace any more, and although segmentation will be performed, and the on-disk zarr store will be update, the graph of your in memory sdata.labels[ ... ] is not yet evaluated if you do this:

sp.im.segment(
    sdata,
    img_layer="raw_image",
    diameter=30,
    cellprob_threshold=-4,
    flow_threshold=0.9,
    model_type="nuclei",
    # output_labels_layer="segmentation_mask",
    do_3D=False,
    channels=[1, 0],
    # overwrite=True,
)

but it will be if you do this:

sdata=sp.im.segment(
    sdata,
    img_layer="raw_image",
    diameter=30,
    cellprob_threshold=-4,
    flow_threshold=0.9,
    model_type="nuclei",
    # output_labels_layer="segmentation_mask",
    do_3D=False,
    channels=[1, 0],
    # overwrite=True,
)

I am unsure how to fix this. It is clearly unwanted behaviour. But for a stable fix, we should know what the plans are in spatialdata for backed spatialdata objects. Now sdata.add_image raises the following error message:

def _error_message_add_element() -> None:
    raise RuntimeError(
        "The functions add_image(), add_labels(), add_points() and add_shapes() have been removed in favor of "
        "dict-like access to the elements. Please use the following syntax to add an element:\n"
        "\n"
        '\tsdata.images["image_name"] = image\n'
        '\tsdata.labels["labels_name"] = labels\n'
        "\t...\n"
        "\n"
        "The new syntax does not automatically updates the disk storage, so you need to call sdata.write() when "
        "the in-memory object is ready to be saved.\n"
        "To save only a new specific element to an existing Zarr storage please use the functions write_image(), "
        "write_labels(), write_points(), write_shapes() and write_table(). We are going to make these calls more "
        "ergonomic in a follow up PR."
    )