rnabioco / clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://rnabioco.github.io/clustifyr/
MIT License
112 stars 14 forks source link

Using marker genes with different length #391

Closed sofiapuvogelvittini closed 1 year ago

sofiapuvogelvittini commented 2 years ago

Hello, thanks for developing this package. I am trying to compare my clusters with the clusters of a previously published scRNA seq dataset.

The reference clusters don't have the same number of marker genes, so I am filling with NA values the dataframe that contains all the marker genes per reference cluster. How clustify_lists() treat the NA values? May this affects my results? All the best and thanks for your time, Sof'ia

raysinensis commented 2 years ago

Hi, you can pass reference markers as a list instead of same-length dataframe: Example below:

pbmc_markers as FindAllMarkers output gene list

pbmc_input <- split(pbmc_markers$gene, pbmc_markers$cluster)

reference gene list that is uneven length

pbmc_ref <- pos_neg_marker( list(B = c("CD79A", "CD79B", "MS4A1"), NK = c("GZMB", "GNLY")) )

reverse input and reference

res <- clustify_lists( pbmc_ref, pbmc_input, metric = "jaccard", input_markers = TRUE )