liulab-dfci / MAESTRO

Single-cell Transcriptome and Regulome Analysis Pipeline
GNU General Public License v3.0
277 stars 76 forks source link

Run with predetermined clusters #132

Closed bc2zb closed 3 years ago

bc2zb commented 3 years ago

Related to #77,

I'm trying to use MAESTRO on clusters of interest that have been determined using other tools. What is the appropriate steps. Essentially, I've tried multiple ways to overwrite the seurat clusters and Idents to use the "monocle_clusters".

FindAllMarkersMAESTRO() is fine at noticing the updated clusters when using Idents() <- .... RNAAnnotateCelltype() only uses the updated clusters if ...$seurat_clusters is explicitly overwritten. Using the Idents() <- ... is not sufficient. Doing one or the other or both does not work for RNAAnnotateTranscriptionFactor(), and I can't figure out where in the source code the function might be grabbing the original idents and overwriting the clusters.

Idents(pbmc.RNA.res$RNA) <- sce_glm_pca$monocle_clusters
pbmc.RNA.res$RNA$seurat_clusters <- sce_glm_pca$monocle_clusters
sce_genes <- FindAllMarkersMAESTRO(pbmc.RNA.res$RNA)

pbmc.RNA.res$RNA <- RNAAnnotateCelltype(RNA = pbmc.RNA.res$RNA, 
                                          gene = sce_genes,
                                          signatures = as.data.frame(signatures), 
                                          min.score = 0.05)

pbmc_RNA_tfs <- RNAAnnotateTranscriptionFactor(RNA = pbmc.RNA.res$RNA,
                                                 genes = sce_genes,
                                                 project = pbmc.RNA.res$RNA@project.name,
                                                 organism = "GRCh38",
                                                 outdir = "sce-lisa",
                                                 top.tf = 10)

When I run the LISA step above, no matter how much manipulation I've done to the seurat object, LISA prints

Start to run Lisa.
Loading gene info ...
Modeling 0.txt:
        Matching genes and selecting background ...
        53 query genes and 501 background genes selected
        Loading data into memory (only on first prediction):
        Modeling DNase purturbations:
        Calculating ChIP-seq peak-RP p-values ...
        Modeling H3K27ac purturbations:
        Mixing effects using Cauchy combination ...
        Formatting output ...
        Done!

There is no cluster 0 in my genes table.

r$> table(sce_genes$cluster)                                                                                                       

   1   10   11   12    2    3    4    5    6    7    8    9 
 167  384  979 1068  226  304  664  701  436  542  251  102 
bc2zb commented 3 years ago

Update, debugging the source code revealed that LISA will run on whatever cluster results already exist. Clearing out the directory or changing the directory solves this issue. May want to add a note to the documentation about this.