rnabioco / clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://rnabioco.github.io/clustifyr/
MIT License
112 stars 14 forks source link

clustify_lists marker gene number and cutoff #395

Closed liaoshengguang closed 1 year ago

liaoshengguang commented 1 year ago

Hello, thanks for this wonderful package. I am trying to annotate my clusters with the known marker genes with clustify_lists function.

i have two questions:

  1. how many marker genes need for each cluster, i tried 3 marker genes for each cluster as the sample data, but the result is not satisfactory.
  2. how to set the cutoff with the different metric, for example the pct metric, one cluster may have pct=0.9, and another cluster pct=0.3, so that whether we can say the pct=0.3 is not reliable. So how to set the cutoff for there metrics "hyper", "jaccard", "spearman", "gsea","pct".

Many thanks.

kriemo commented 1 year ago

1) The number of markers needed will depend on the specificity of the markers selected. In general I would recommend using 10-100 markers.

2) Setting these parameters is dataset specific and it's hard to provide specific guidelines. In general I find it useful to examine the strength of the metrix when for a classification that is likely correct and compare this to a classification that is likely incorrect. For most datasets you'll likely have one or more clusters that are clearly identifiable as a known cell type that should occur in the dataset. Compare the classification values for these clusters, to the classification values returned for correlations against a cell type that should not be present in the data. The spread between these values can help with defining cutoffs.

@raysinensis may have some additional guidance to share.