kharchenkolab / Baysor

Bayesian Segmentation of Spatial Transcriptomics Data
https://kharchenkolab.github.io/Baysor/
MIT License
152 stars 31 forks source link

A question about prior segmentation #39

Closed rocketeer1998 closed 1 year ago

rocketeer1998 commented 2 years ago

Hi @VPetukhov , I've got my cell segmentation plot after running baysor run with [PRIOR_SEGMENTATION] parameter and prior-segmentation-confidence = 1 parameter. And my prior segmentation file is processed under your recommendation, which uses ImagJ to perform watershed segmentation. A part of my plot is like this:

baysor

Does the yellow block correspond to DAPI nuclei? Or cell boundary? Why the cell boundary produced by baysor run is not totally covered by prior segmentation, even if I set prior-segmentation-confidence = 1 ?

tsuijenk commented 2 years ago

Hi @rocketeer1998, sorry for interrupting here but maybe @VPetukhov can chip in as well. I got kind of confused by the output of Baysor. Baysor is supposed to do cell segmentation, if I am correct. But some of these cell areas here are so tiny that it includes one to two molecules. But biologically speaking, human cells have between 20,000 and 25,000 genes.

I am also curious about the nucleus segmentation part.

Please advise! :)

LucaGiudice commented 2 years ago

I think that Baysor is not considering the prior segmentation properly (or at least how I supposed that it could use it). I thought that it would have looked to find cells with only molecules falling into regions of the prior segmentation but it doesn't do like this. It seems that it uses all the molecules independently if they fall or not into the regions of the prior segmentation and then it does not provide any further information to distinguish the cells based on the prior segmentation. I tested this point giving a binary mask with only small 6 regions, I can clearly see that only 6-10 detect cells (clusters of transcripts) fall into the regions (yellow areas) from the output html report (this was the expected result) but then the count matrix reports 10k cells with confidence greater than 0.99. Plus, there is no other meta information retrieved by the method to discern the 6-10 correct cells from the others 10k wrong cells.

Segment

VPetukhov commented 1 year ago

Hi everyone, The details of the method are described in the paper, "Methods" -> "Cell segmentation" -> "Using a prior segmentation".

Regarding the points of @rocketeer1998 and @LucaGiudice :

The prior segmentation penalty is evaluated only for the molecules that are assigned to some cell (but not to background) in both Baysor and the prior segmentations. This accounts for the fact that the imaging-based segmentations may miss some cells or portions of cells that can still be deduced from the spatial transcriptomics data. The most obvious example of such situation are the DAPI-based segmentations, which cover only molecules within the cell nuclei, leaving most of the cytoplasm molecules unannotated.

@tsuijenk , Baysor can produce cells of arbitrary small sizes. And the estimate of 20k genes per cell does not help here, as it all comes down to the comes down to the capture rate and the number of measured genes. Protocols like ISS, for example, can only have 1-2 molecules per cell. So, these small cells should be removed during post-processing, the same way you do cell size thresholding for scRNA-seq data.