theislab / multires-consensus-clustering

Code for the multi-resolution consensus clustering project
MIT License
3 stars 0 forks source link

Single node multi-res-graph #35

Open JanRhoKa opened 2 years ago

JanRhoKa commented 2 years ago

Currently if the multi-res-graph has only one node ( and therefore no edges ) all cells in the data set are assigned to a single cluster, even if the single node does not contain all these cells. So if the node has 1600 cells with probabilities > 0 it still has the remaining cells with probability = 0. Therefore the cells are assigned to the same cluster. The question is now, do we want this to be the cases or should we assign them a label of -1 to clarify cells not assigned by the algorithm ?

JanRhoKa commented 2 years ago

image image bokeh_plot (9)

JanRhoKa commented 2 years ago

Important context for the pictures: The displayed graph was only created from bin=[18, 20, 30, 40]. If I use a bin containing the resolution 1 and more, then the single node does always contain all cells.

lazappi commented 2 years ago

Is this for the sim-blob.h5ad dataset I sent you? In that case there is only one simulated group so this would actually be the correct result. I think it is better to assign them because we can also return the probabilities for people to check. Having unsigned cells tends to break things.

I am a bit confused why the probabilities are zero though. Does that mean these cells are never included in any of the clusters that have been merged into this node?

JanRhoKa commented 2 years ago

Yes this is the sim-blob.h5ad dataset. The group does not contain all cells as the probability outlier detection deleted the other nodes because their probability was to low. But in this case this happening as I only selected high resolution bins, they don't overlap as much as lower resolutions would and thus there is only one node left after the outlier detection.