Implement labels loading for OME-Zarr HCS plates

tischi commented 7 months ago

I think labels are not yet supported.

These datasest have labels and may be used for development:

https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/2551.zarr (broken label subsampling)
https://zenodo.org/records/8091910

tischi commented 7 months ago

works now.

jeskowagner commented 1 month ago

Does this still work?

For the Zenodo-hosted MIP I get:

[ERROR] Command errored: Open HCS Dataset...
java.lang.RuntimeException: Could not determine HCSPattern for /<...>/20200812-CardiomyocyteDifferentiation14-Cycle1_mip.zarr.zip
    at org.embl.mobie.lib.hcs.Plate.determineHCSPattern(Plate.java:419)
    at org.embl.mobie.lib.hcs.Plate.<init>(Plate.java:139)
    at org.embl.mobie.MoBIE.openHCSDataset(MoBIE.java:263)
    at org.embl.mobie.MoBIE.<init>(MoBIE.java:153)
    at org.embl.mobie.command.open.OpenHCSDatasetCommand.run(OpenHCSDatasetCommand.java:87)
    at org.scijava.command.CommandModule.run(CommandModule.java:196)
    at org.scijava.module.ModuleRunner.run(ModuleRunner.java:165)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:125)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:64)
    at org.scijava.thread.DefaultThreadService.lambda$wrap$2(DefaultThreadService.java:247)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

And for the S3 hosted data the zarr loads fine, but I cannot see the labels:

Fiji version: 2.14.0/1.54f Build: c89e8500e4 OS: macOS Sonoma 14

tischi commented 1 month ago

Dear @jeskowagner,

Could you please share the S3 address with me?

I don't think streaming directly from Zenodo is something that we are supporting. Is that what you tried? I think you may first have to download the data locally. If you feel directly streaming from Zenodo should work, please let me know and I can look into it....

Just saw that the Zenodo file is a ZIP (20200812-CardiomyocyteDifferentiation14-Cycle1_mip.zarr.zip), I think this for sure will currently now work. If you send me the full address we can however investigate whether this may be possible in the future.

jeskowagner commented 1 month ago

Hi @tischi, Thanks for your fast response! You were spot on, I was being slow this morning and didn't spot the zarr comes zipped (should've figured). It loads perfectly fine after unzipping; reading zipped zarr or streaming from Zenodo is not an important use case for me. Regarding the S3 link: sorry for being unclear, I meant the links posted in the first comment on this issue, i.e. https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/2551.zarr.

tischi commented 1 month ago

@jeskowagner I guess the labels are not shown in either case: S3 and local, right? I will try to have a look, but I will be on holidays next week. Unless I find a quick fix today, it will have to wait one or two weeks 😞

tischi commented 1 month ago

Hi @tibuch,

Does working labels from OME-Zarr plates work for you?
Would you be able to provide a minimal (again not chunked) example data set including labels for testing?
Thanks!

jeskowagner commented 1 month ago

For the zenodo-hosted data (after downloading) displaying the labels works:

Although I think it technically doesn't show all labels, because under B/03/0/labels/.zattrs we have

{
    "labels": [
        "nuclei",
        "wf_2_labels",
        "wf_3_labels",
        "wf_4_labels"
    ]
}

And all of these labels do indeed have corresponding folders in the labels folder. In the picture above it is not clear which of those labels we are seeing (presumably nuclei?).

For the S3 data I have only tried streaming, because I get permission errors when trying to download through aws (though I'm sure there is something I am doing wrong: aws s3 cp s3://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/5966.zarr . --recursive --no-sign-request).

And just to add: this is not a high priority issue of course, please enjoy your Friday and vacation!

tischi commented 1 month ago

@jeskowagner

Ok...I don't think we ever had a use-case with multiple labels fields...might be that this is not supported yet; how big is this Zendo plate? Is that something you could share with me such that I could test locally?
The data on S3, is that the same data or some other data?

@tibuch even though it seems to work in principle a minimal dataset for testing including labels would be awesome.

jeskowagner commented 1 month ago

Note that I am not the author of either data. That said, you can download the Zenodo data I visualised with the labels here. The download is quite modest - around 100 Mb and contains only a single well and a single site with 3 imaging channels and seemingly 4 label masks. As to motivation - having the option to toggle nuclei/cell/cytoplasm masks can be handy when troubleshooting segmentation issues. Pinging @jluethi who is the author of the Zenodo-hosted data.

tischi commented 1 month ago

...and another post (sorry, please read everything above).

I opened https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/2551.zarr in Fiji using: Plugins › BigDataViewer › HDF5/N5/Zarr/OME-NGFF Viewer

And this dataset still has the issue (see the very first post in this issue) that the labels are wrongly down-sampled: the datatype is changing to float during the downsampling. This is not a valid OME-Zarr. I think I already mentioned this to @joshmoore but apparently this had not yet been fixed (maybe there also never was a plan of fixing this, I cannot remember now 😄 ).

tischi commented 1 month ago

Thanks for the Zenodo link; I will download this and see whether we can manage to open all labels.

jeskowagner commented 1 month ago

In the Zenodo description I just saw that my interpretation of having multiple labels in this file may be incorrect. The description states (emphasis mine):

20200812-CardiomyocyteDifferentiation14-Cycle1_mip.zarr contains the same 3 channels, but as maximum intensity projections. It contains nuclear segmentation through cellpose. It also contains 6 tables: The region of interests like in the 3D data, as well as measurements performed with napari-skimage-regionprops.

I would take that to mean it only contains a single label mask. In that case I am unsure what wf_2_labels to wf_4_labels, which are in B/03/0/labels/, represent.

tischi commented 1 month ago

Maybe @jluethi knows?

tischi commented 1 month ago

It turns out that loading the labels does not work anymore in mobie-5.0.0, working on this here: https://github.com/mobie/mobie-viewer-fiji/tree/issue-1066

jluethi commented 1 month ago

Note that I am not the author of either data. That said, you can download the Zenodo data I visualised with the labels here. The download is quite modest - around 100 Mb and contains only a single well and a single site with 3 imaging channels and seemingly 4 label masks. As to motivation - having the option to toggle nuclei/cell/cytoplasm masks can be handy when troubleshooting segmentation issues. Pinging @jluethi who is the author of the Zenodo-hosted data.

Re the example OME-Zarr we have on Zenodo: Feel free to use that for tests if it's useful. We placed them there to use them that way as well :)

There is a slightly newer version available here: https://zenodo.org/records/10257532 (mostly with fixes to table metadata I think)

In the Zenodo description I just saw that my interpretation of having multiple labels in this file may be incorrect. The description states (emphasis mine):

That's just a non-updated description then. The OME-Zarr really does contain 4 labels:

(screenshot in napari viewer, though typical issues there are deprecation of label color coming with 0.5.0 of napari => work on the plugin needed. And that it doesn't load any labels when loading a whole plate by default, only when loading individual images). I now updated the description to be accurate again :)

tischi commented 1 month ago

OK, I have to fix my code then.

Notes to self:

Probably a fix is needed in mobie-io where for each OME-Zarr dataset I look whether there are sub-folder datasets where the path contains labels.
These should be then added to the List< SourceAndConverter > of the corresponding N5ImageData
I will probably also need to introduce a getDatasetName( index ) into the ImageData interface to be able to check whether the data set name contains labels and then treat it as a segmentation in MoBIE; having dataset names could be useful anyway, because for some image file formats I may be able to fetch channel names from the metadata.
https://imagesc.zulipchat.com/#narrow/stream/328251-NGFF/topic/Making.20it.20easy.20to.20read.20.2B.20write.20ome-zarr.20in.20Fiji

Fetching all the multiscale datasets:

final N5Reader n5 = ...
final N5TreeNode root = N5DatasetDiscoverer.discover(n5); // parse all metadata
N5TreeNode.flattenN5Tree(root)
        .filter(x -> {   // get all the "multiscales" metadata
            final N5Metadata meta = x.getMetadata();
            return meta != null && meta instanceof OmeNgffMetadata;
        }).flatMap( x -> {
            return x.getDescendants(n -> true); // return every child of a node containing multiscales
        }).forEach(x -> {
            System.out.println(x.getPath());
        });

jeskowagner commented 1 month ago

@jluethi which napari plugin are you using for that picture? With the current ome-zarr-py(https://github.com/ome/napari-ome-zarr) v0.5.3 and napari 0.4.19 I cannot see the labels: I also tried napari v0.5.0a1, unfortunately also without success.

jluethi commented 1 month ago

Hey @jeskowagner yeah, the napari ome zarr plugin unfortunately doesn't load labels by default, see discussion here: Library: https://github.com/ome/ome-zarr-py/pull/207 Plugin: https://github.com/ome/napari-ome-zarr/pull/54

e.g. if you have a /path/to/my/plate.zarr and drag & drop this into napari, it will only load the images. With the 2 branches above, it can also load the labels.

Default napari-ome-zarr opens labels only if you load a single image, e.g. /path/to/my/plate.zarr/A/01/0.

Also, please be aware that napari 0.5.0 is deprecating some label color functions that the plugin currently uses. Until that is updated, the napari-ome-zarr plugin is not fully compatible with the current main build of napari (like 0.5.0a1)

jeskowagner commented 1 month ago

Fantastic, thanks @jluethi! I can use napari to test my implementation of writing labels into an existing zarr using faim-ipa with those PRs of napari/napari-ome-zarr/ome-zarr-py then!

@tischi that sounds like a great plan.

Probably a fix is needed in mobie-io where for each OME-Zarr dataset I look whether there are sub-folder datasets where the path contains labels.

I saw that the spec mentions that all label images should be listed in labels/.zattrs (though I am surprised by this being optional, because the preceding sentence seems to suggest this is mandatory. In any case, it might be faster/less IO heavy to just read labels/.zattrs rather than checking subfolders of labels.

Unfortunately my Java is incredibly rusty, so I won't be able to contribute a PR. Sorry about that.

tischi commented 1 month ago

In any case, it might be faster/less IO heavy to just read labels/.zattrs rather than checking subfolders of labels.

Yes, this is a consideration. In general I wonder what one would expect if one "opens" something like /path/to/my/plate.zarr/A/01/0...my current thinking is that it could make sense to open all datasets that are at that level and deeper than that, which in our case would include all the labels subfolders. Since, based on recent discussions around the ngff spec, I feel that labels/.zattrs might become deprecated I don't want to parse it right now, but simply open everything that's "below" /path/to/my/plate.zarr/A/01/0. However, indeed, if it turns out that discovering all datasets in all subfolders is slow than that is a concern...I will have to benchmark this.

In any case. I am on holidays this week. I will get back to it next week.

tibuch commented 1 month ago

What would happen if one of the sub-directories in /path/to/my/plate.zarr/A/01/0 is not a zarr? But some custom object or even directory structure that should not be interpreted by the viewer?

tischi commented 1 month ago

@tibuch that is in fact the case with the data that @jluethi creates, as there are also tables stored. Such data can be filtered by the fact that they are not OmeNgffMetadata. If you scroll a bit up, you will find a code snippet kindly provided by John Bogovic that shows how to do this. However, I decided for now to explicitly look for the labels sub-folder, because this should save some time in comparison to parsing the whole zarr directory structure (once this issue is fixed). I'd be grateful if you could have a look at my current implementation whether that looks sensible to you! If you even want to play with this and check whether it works for your data you could modify this function. Note that this currently only works for locally stored OME-Zarr since there seems to be some issue parsing S3 hosted OME-Zarr using the latest n5 libraries (at least I am getting null there).

mobie / mobie-viewer-fiji

Implement labels loading for OME-Zarr HCS plates #1066