Open berombau opened 7 months ago
Is your spatialdata object backed by a zarr store? I will try to reproduce it
Yes, it was backed by an on-disk .zarr as far as I can tell.
I could only reproduce this issue when using the spatialdata from the main branch, and the harpy fork of sparrow https://github.com/saeyslab/harpy. On the main branch of napari sparrow, everything works as expected.
So spatialdata removed the methods sdata.add_image, sdata.add_labels,..
, that took care of writing to the zarr storage, and then lazily loading it back after computation of the graph was finished.
Because they removed these utility methods, in the harpy fork we do this to add labels to sdata object:
sdata.labels[output_layer] = spatial_element
if sdata.is_backed():
elem_group = sdata._init_add_element(name=output_layer, element_type="labels", overwrite=overwrite)
write_labels(
labels=sdata.labels[output_layer],
group=elem_group,
name=output_layer,
)
sdata = read_zarr(sdata.path)
Because of this sdata will not be updated inplace any more, and although segmentation will be performed, and the on-disk zarr store will be update, the graph of your in memory sdata.labels[ ... ] is not yet evaluated if you do this:
sp.im.segment(
sdata,
img_layer="raw_image",
diameter=30,
cellprob_threshold=-4,
flow_threshold=0.9,
model_type="nuclei",
# output_labels_layer="segmentation_mask",
do_3D=False,
channels=[1, 0],
# overwrite=True,
)
but it will be if you do this:
sdata=sp.im.segment(
sdata,
img_layer="raw_image",
diameter=30,
cellprob_threshold=-4,
flow_threshold=0.9,
model_type="nuclei",
# output_labels_layer="segmentation_mask",
do_3D=False,
channels=[1, 0],
# overwrite=True,
)
I am unsure how to fix this. It is clearly unwanted behaviour. But for a stable fix, we should know what the plans are in spatialdata for backed spatialdata objects. Now sdata.add_image raises the following error message:
def _error_message_add_element() -> None:
raise RuntimeError(
"The functions add_image(), add_labels(), add_points() and add_shapes() have been removed in favor of "
"dict-like access to the elements. Please use the following syntax to add an element:\n"
"\n"
'\tsdata.images["image_name"] = image\n'
'\tsdata.labels["labels_name"] = labels\n'
"\t...\n"
"\n"
"The new syntax does not automatically updates the disk storage, so you need to call sdata.write() when "
"the in-memory object is ready to be saved.\n"
"To save only a new specific element to an existing Zarr storage please use the functions write_image(), "
"write_labels(), write_points(), write_shapes() and write_table(). We are going to make these calls more "
"ergonomic in a follow up PR."
)
@ArneDefauw I have a weird bug where
allocate_intensity
is very slow. When looking at the Dask dashboard, it seems to not use the segmentation_mask already in the SpatialData object from thesegment
step before, but some reference still to the segmentation step that created it. So every chunk holds up the line in the Dask graph waiting for cellpose to execute on it again, instead of using the mask.My image, but maybe sd.datasets.blobs will give the same error
It's hard to debug, I can try to create a better issue next week. It could also be that the backed zarr or some rechunking messes up the graph:
sdata.images[el_name].data = sdata.images[el_name].data.rechunk()