mehta-lab / VisCy

computer vision models for single-cell phenotyping
https://pypi.org/project/viscy/
BSD 3-Clause "New" or "Revised" License
38 stars 3 forks source link

Inference output image format #18

Closed Soorya19Pradeep closed 1 year ago

Soorya19Pradeep commented 1 year ago

@Christianfoley and I are trying to decide which is the best image format to save the inference predicted images, whether to use zarr or tiff. Zarr is better for storing the data, but there are some softwares used for processing the predicted image which works with single page tiffs. @mattersoflight has commented that we should aim to store the predictions as zarr. We can read zarr to numpy array and then perform downstream analysis (i.e., metrics evaluation, this links to issue #202). Anything to add @Christianfoley , @mattersoflight , @ziw-liu ?

Christianfoley commented 1 year ago

Hi Soorya, thanks for bringing this up. I think saving images in a zarr store should be fine as long as it doesn't create too many problems with utilizing CellPose downstream. Is it possible to use the CellPose GUI without converting the zarr store data into single page tiffs?

Note that each new set of predictions would have to be stored in a brand new zarr store local to their relative model in accordance to mehta-lab/microDL#203.

mattersoflight commented 1 year ago

With the published toolchain around ome-zarr, it makes a good default.

Inference with cellpose noteook does not require saving tiffs.

masks, flows, styles, diams = model.eval(img1, diameter=diameter, flow_threshold=flow_threshold,cellprob_threshold=cellprob_threshold, channels=channels)

If we need to train a cellpose model on prediction, it is possible to write a thin wrapper using iohub and tifffile to setup a training directory following this example (https://github.com/MouseLand/cellpose/blob/main/notebooks/run_cellpose_2.ipynb).

mattersoflight commented 1 year ago

Note that each new set of predictions would have to be stored in a brand new zarr store local to their relative model.

I agree. The predictions can be stored as ome-zarr, with path to the model and the per-image metrics stored in metada. Per-image metadata can be written as json metadata with ome-zarr. We should be able to read metadata with pandas if properly formatted (https://pandas.pydata.org/docs/reference/api/pandas.read_json.html). We could also store the metadata as a non-intrusive csv file with the ome-zarr.

Please pick whichever format (json or csv) is easiest to work with.

The prediction zarr store can have the predicted grayscale image, ground truth grayscale image, and various masks all in the same store. All grayscale images and masks can be saved as named channels such that they are practical to load, and importantly, overlay in napari.

The tensorboard log's size can be kept reasonable by reporting dataset-level metrics along with a few samples from the test set (say, 100 randomly chosen images from the whole test set).

mattersoflight commented 1 year ago

@ziw-liu does iohub contain some convenience functions to write and load tabular metadata as json? How about saving masks as labels layer?

Christianfoley commented 1 year ago

The tensorboard log's size can be kept reasonable by reporting dataset-level metrics along with a few samples from the test set

Soorya and I have discovered that tensorboard images are highly compressed, which made me question how useful they might be for monitoring training and inference quality:

(A 2000 x 2000 retardance image of confluent A549 cells viewed through tensorboard VS the original image. Note the considerable compression artifacts...) image

image

I have not tried saving the images through matplotlib outputs. Perhaps this will help the compression.

mattersoflight commented 1 year ago

@Christianfoley, @ziw-liu I suggest writing the output of inference as an ome-ngff store.

napari-ome-zarr is not quite practical to use to browse zarr stores in HCS format, but it is easy enough to write a simple CLI to parse the data to view in napari (https://github.com/mehta-lab/recOrder/blob/825f06ce689c2ef050aaef9795a9ffa07b513e9e/recOrder/scripts/cli.py#L65).

I also think a neuroglancer script can work well to load the dynamically populated data. One example here, but we should read more.

ziw-liu commented 1 year ago

OME-Zarr prediction is implemented in #14.