Ensuring cluster proximity in co-embedding for a selected subset

snap-stanford / SATURN

MIT License

108 stars 17 forks source link

Ensuring cluster proximity in co-embedding for a selected subset #11

Closed lijinbio closed 1 year ago

lijinbio commented 1 year ago

Hello there,

During the co-embedding training using the full-cell clusters, SATURN suggested several novel mappings of clusters. To verify the accuracy of the co-embedding, I ran SATURN on a subset of potential novel clusters. However, it seems that the clusters are no longer overlaid. Can you please advise me on how to configure SATURN to maintain the proximity of the clusters between the full cells and the subset of clusters? Thank you.

Yanay1 commented 1 year ago

I would first suggest to run SATURN multiple times and compare the co-clustering occurrences of cell types from different runs. You can do so using the saturn_multiple_seeds script, as shown in this vignette.

You could do this for different subset levels as well. It could be that all these clusters are very similar.

lijinbio commented 1 year ago

Hi, @Yanay1! I have a question about the inconsistent mapping results I obtained from different seeds. I ran my experiment with multiple seeds (e.g., 8 seeds), and while some clusters consistently overlapped across seeds, others were not stable and showed varying mappings to different clusters. For clusters with novel mappings, some seeds yielded the same results, while others showed differences. Do you have any suggestions for interpreting these inconsistent mappings obtained from different seeds? Thank you!

Yanay1 commented 1 year ago

What kinds of tissues/species/cell types are you integrating? It might be difficult to draw conclusions if the clustering results are not consistent, but it could be useful to do differential macrogene expression between the clusters. Could the different clusters all be biologically related?

lijinbio commented 1 year ago

Hey @Yanay1. I am currently integrating cell types from the same tissues across 3 different species, and I have observed that unstable clusters are biologically relevant based on the literature. Can you provide further insights on why different seed values may result in unstable mapping for specific clusters? Thank you.

lijinbio commented 1 year ago

Hey @Yanay1. Is it advisable to optimize certain hyperparameters in metric learning in order to achieve a consistent mapping, such as using k-MNN (k>1) for anchor cells or exploring different distance metrics? Your insights and recommendations would be greatly appreciated. Thank you.

Yanay1 commented 1 year ago

Hi (sorry for the late reply!), we did not explore using k>1 during metric learning or different distance metrics, so it might be interesting to see those results! The embedding dimension is fairly high (256 by default) so different distance metrics might be very similar to each other. For increasing k>1, there might be some extra considerations because we use mutual nearest neighbors.