rnabioco / clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://rnabioco.github.io/clustifyr/
MIT License
103 stars 14 forks source link

clustify_lists - 'pct' #386

Closed Dazcam closed 2 years ago

Dazcam commented 2 years ago

I have a set of gene lists that I'm using for cell type classification and have run clustify_lists() as described in your tutorial in which the metric parameter is set to 'pct'.

I'm struggling to find a thorough explanation of what is being run under the hood when this the 'pct' option is used. All I could find in the paper was:

clustify_lists will calculate enrichment with a hypergeometric test, marker overlap with the jaccard index, or use the percent of cells expressing marker genes to annotate cell types.

Could you elaborate on the last (pct) part a bit? It's not 100% clear (at least to me) what is going on, particularly with regard to whether you are comparing marker genes across cells or clusters (or both).

Many thanks.

raysinensis commented 2 years ago

Hi, For each gene, "pct" calculates % of cells in each cluster with detectable expression. This is then summarized for all genes to reach a "score".

Dazcam commented 2 years ago

Hi, and thanks for responding.

That makes sense, but how is that score summarised exactly?

Is it right to assume no formal statistical test is run here?

And right to assume that it is the summarised gene 'score' that is compared between the test and query datasets? Or to put it another way, all cells in a cluster in the query dataset are assigned the identity of the cluster in the reference with the most similar 'score'.

Many Thanks.

raysinensis commented 2 years ago

for instance, if reference type A has 3 marker genes, and in cluster 1 the detection percentages are 0, 0.1, 0.5, the default is just to take mean, so 0.2 is the "score" for cluster 1.

yes, no statistical testing, more for exploration