hammerlab / cytokit

Microscopy Image Cytometry Toolkit
Apache License 2.0
115 stars 18 forks source link

Is there a way to change threshold for nucleus segmentation? #36

Closed VasylVaskivskyi closed 2 years ago

VasylVaskivskyi commented 2 years ago

I've noticed that Cytokit segmentation performs poorly on some CODEX datasets. Is there a way to change a threshold that tells which nuclei are allowed to pass. Or maybe there are some other parameters that can influence the quality of nucleus segmentation? I could only find some options that influence size of the nuclei masks.

eric-czech commented 2 years ago

The two most important factors I saw for that segmentation model are the zoom of the image and whether or not it was deconvolved. Are your images at 10x or 20x and not something closer to 40x? If so, are you using the deconvolution preprocessing?

VasylVaskivskyi commented 2 years ago

Thank you for a quick reply. The images are 20x, I run deconvolution and drift compensation. Here is a config file that I use.

date: '2021-07-26 16:50:13'
environment: {path_formats: keyence_multi_cycle_v01}
name: e3aa11ba0218456e2cc9302f6b1d9d1c
acquisition:
  axial_resolution: 1500.0
  channel_names: [DAPI-01, Blank, Blank, Blank, DAPI-02, CD31, CD8, CD45, DAPI-03,
    CD20, Ki67, CD3e, DAPI-04, Actin, Podoplan, CD68, DAPI-05, PanCK, CD21, CD4, DAPI-06,
    Empty, CD45RO, CD11c, DAPI-07, Empty, E-CAD, CD107a, DAPI-08, Empty, CD44, H3,
    DAPI-09, Blank, Blank, Blank]
  emission_wavelengths: [358, 750, 550, 650]
  lateral_resolution: 377.40384615384613
  magnification: 20
  num_cycles: 9
  num_z_planes: 1
  numerical_aperture: 0.75
  objective_type: air
  per_cycle_channel_names: [CH1, CH2, CH3, CH4]
  region_height: 10
  region_names: [1]
  region_width: 10
  tile_height: 1000
  tile_overlap_x: 200
  tile_overlap_y: 200
  tile_width: 1000
  tiling_mode: grid
analysis:
- aggregate_cytometry_statistics: {mode: best_z_plane}
operator:
- extract:
    channels: [proc_DAPI-02, proc_CD8, proc_CD20,
      proc_Ki67, proc_CD3e, proc_CD21, proc_CD4, proc_CD45RO, proc_CD11c, proc_E-CAD,
      proc_CD107a]
    name: expressions
    z: all
processor:
  args:
    gpus: [0, 1]
    memory_limit: 64G
    run_best_focus: true
    run_crop: false
    run_cytometry: true
    run_deconvolution: true
    run_drift_comp: true
    run_tile_generator: true
  best_focus: {channel: DAPI-02}
  cytometry:
    membrane_channel_name: CD45
    nuclei_channel_name: DAPI-02
    quantification_params: {cell_graph: true, nucleus_intensity: true}
    segmentation_params: {marker_dilation: 3, marker_min_size: 2, memb_gamma: 0.25,
      memb_min_dist: 8, memb_sigma: 5}
    target_shape: [1000, 1000]
  deconvolution: {n_iter: 25, scale_factor: 0.5}
  drift_compensation: {channel: DAPI-02}
  tile_generator: {raw_file_type: keyence_mixed}
eric-czech commented 2 years ago

I'd suggest removing cytometry.segmentation_params as well as maybe cytometry.target_shape (that should be OK as-is but no need to keep that around in case you change run_crop, in which case the image would be upsampled) and then trying again with run_deconvolution: false.

Honestly, I wasn't ever able to convince myself that deconvolution helped our CODEX quantifications (despite being visually much clearer) so I often disabled it. By this I mean I had tried segmenting the original images (which is generally better) and quantifying the deconvolved images separately, but that didn't help either so I never added that feature to the library.

VasylVaskivskyi commented 2 years ago

I tried to run with cytometry.segmentation_params and cytometry.target_shape disabled, but it didn't improve the result. However, turning off deconvolution improved results quite significantly - Cytokit segmented three times more nuclei.

eric-czech commented 2 years ago

Nice! Anything else I can help with then before closing this out?

VasylVaskivskyi commented 2 years ago

Yes, are there any other ways to improve the quality of segmentation results? Because, Cytokit still doesn't segment around a half of the nuclei in some datasets, although their borders are clearly visible. Here is a bad region (image quality is low because it is a screenshot from remote VM) image

eric-czech commented 2 years ago

hmm can you share an image at its original resolution?

VasylVaskivskyi commented 2 years ago

I can give you a link to GDrive with original image. I extracted DAPI channel and nucleus labels, but images still take up together 500 MB. [link removed]

eric-czech commented 2 years ago

I'm not immediately sure what's going on, but I'd certainly expect a lot of those missed nuclei to be getting picked up on good images like that. A few suggestions:

That last one would let you try different segmentation parameters interactively. That might help but you'd have to be willing to read the code for it and I don't have an intuition for what's worth attempting since it's been a while.

VasylVaskivskyi commented 2 years ago

Thank you for you help. I will have a look at your suggestions regarding different UNet implementations. As for the last option, I think my colleagues and I already tested all parameters for Cytokit that are available in yaml config, and the ones that I enclosed in the previous comment were working well for old datasets.

VolkerH commented 2 years ago

Just came here because of a notification as unet-nuclei was mentioned. These days, I'd probably just use cellpose or Stardist for nuclei segmentation, they often give very good results out of the box, otherwise after a tiny bit of retraining. I don't know anything about cytokit though, so not sure how difficult it would be to integrate StarDist or cellpose.

VasylVaskivskyi commented 2 years ago

We will probably switch to Deepcell. However, it has a problem, same as Cellpose — the nucleus and cytoplasm segmentation results do not match. So, there are lots of instances where nucleus label is present, but cytoplasm label is not, or vice versa, or nucleus goes beyond the borders of cytoplasm. Apart from that, there is also some development that has to be done to replace Cytokit inside the already established pipeline. That's why we wanted to see if there is anything can be done keep things as is.

eric-czech commented 2 years ago

GTK @VolkerH -- you still using either one of those heavily yourself? I'd be curious to hear about your experience in retraining them.

the nucleus and cytoplasm segmentation results do not match

I don't know what the state-of-the art is for this, but FWIW I thought the CP algorithm for this (from this old Anne Carpenter paper) was actually pretty solid. It has one regularization parameter to tune, but it's at least intuitive. The centrosome lib it's in is also nice and lightweight, and invocation only requires a labeled nucelus image, an optional (I think) mask, and the cytoplasm image (e.g. cytometer.py#L1008). I imagine it being easy for you to integrate directly downstream from deepcell, stardist, etc.

VolkerH commented 2 years ago

I did use both Stardist and Cellpose quite a bit in my previous role, but often referring facility users to it and running a few of their sample images through it. If often solved their particular issue without tuning, so that was very convenient.

In the lab I work currently, we use Cellpose as part of our workflow but I don't use it much myself. My experimental colleagues who had some cells that were not initially segmented well obtained good results after annotating a few images and retraining. However, these colleagues are mainly interested in segmenting the whole cell using a cytoplasm model. I cannot really comment on how well it works segmenting nuclei and corresponding cells together consistently, which seems to be the problem @VasylVaskivskyi encountered.

In the classical workflows you don't usually have that problem as you use the detected nuclei as the seed points for finding the cytoplasmic area, so the association between nucleus and cytoplasm is fixed by design (doesn't work very well for poly-nucleated phenotypes though).

VasylVaskivskyi commented 2 years ago

Thank you very much for the suggestions and discussion. @eric-czech I will try centrosome.propagate on the nucleus segmentation results from DeepCell and Cellpose. @VolkerH Quite likely we won't do retraining on our data. The datasets come from different tissue, organs, providers, so it is too much work to annotate all the combinations, and there is nobody to do it anyway.