wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
168 stars 46 forks source link

How to calculate likehood (Unknown number of mixed samples split) #217

Open licongge opened 10 months ago

licongge commented 10 months ago

Hello, I'm a graduate student. Recently, I encountered some problems when using your new algorithm to split mixed sample data. My single cell data comes from embryos, which number of embryos is unknown. What I want to consult is,how to calculate the total log-likelihood value in your article? According to the cluster_temp file or the cluster file? Do all the data add up directly, or each cell only calculate the log-likelihood value of its own cluster?

wheaton5 commented 10 months ago

Its in one of the .out files (clusters.out maybe). Its on the final line of that file.

wheaton5 commented 10 months ago

It will say something like best total log likelihood is blah

wheaton5 commented 10 months ago

And it is the total marginal log likelihood across all cells and marginalized across clusters as shown in the supplement of the paper

licongge commented 10 months ago

Thank you for your prompt reply. Could you tell me which of the three log-likelihood values in the diagram is more suitable? 1705981017611

wheaton5 commented 10 months ago

As from before, it is the number on the final line. The highest (least negative) value is the best and final result.

licongge commented 10 months ago

Thanks for your guidance. I see how to go on doing it. 4ce7ce95f2d83a7ed24199829801f078